Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That is certainly an approach. But wouldn't that make indexOf() and lastIndexof() different in cases you would intuitively expect them to be the same, e.g:

0-based indexOf "a" in "a" would be 0

0-based lastIndexOf "a" in "a" would be 1

?



No, because “the index of a string within a string” would be consistently defined to always be the lower index (or, alternatively, always the upper index). There is no intrinsic relation to the search direction, since all characters of the search string have to be compared regardless of the direction. For example, when searching from the end, the search algorithm could start comparing at index `totalLength - searchStringLength`.

To further illustrate, it would be the index where, if the search string is removed from the found position, it would have to be inserted in order to revert to the original string.


>it would be the index where, if the search string is removed from the found position, it would have to be inserted in order to revert to the original string.

Marinated on this a bit more. I think this the most straightforward and internally consistent way to think about this. Thanks for the insight.

So I figure based on this:

1-based first index of v1 in v2 is:

| v1 | v2 | IndexOf(v1,v2) |

|-------|-------------|----------------|

| [] | [] | 1 |

| aba | [] | N/A |

| [] | aba | 1 |

| a | a | 1 |

| a | aba | 1 |

| x | y | N/A |

| world | hello world | 7 |

Where []=empty.

This is the same as Excel FIND() and differs from Javascript indexOf() (ignoring difference in indexing) only for "".indexOf("") which returns -1 (N/A).

1-based last index of v1 in v2 is:

| v1 | v2 | LastIndexOf(v1,v2) |

|-------|-------------|--------------------|

| [] | [] | 1 |

| aba | [] | N/A |

| [] | aba | 4 |

| a | a | 1 |

| a | aba | 3 |

| x | y | N/A |

| world | hello world | 7 |

This differs from Javascript lastIndexOf() (ignoring difference in indexing) only for "".indexOf("") which returns -1 (N/A).


If you add four spaces at the beginning of a line, Hacker News will treat it as preformatted. Here are you tables like such just for ease of reading.

---

1-based first index of v1 in v2 is:

    |  v1   |     v2      | IndexOf(v1,v2) |
    |-------|-------------|----------------|
    | []    | []          | 1              |
    | aba   | []          | N/A            |
    | []    | aba         | 1              |
    | a     | a           | 1              |
    | a     | aba         | 1              |
    | x     | y           | N/A            |
    | world | hello world | 7              |


1-based last index of v1 in v2 is:

    |  v1   |     v2      | LastIndexOf(v1,v2) |
    |-------|-------------|--------------------|
    | []    | []          | 1                  |
    | aba   | []          | N/A                |
    | []    | aba         | 4                  |
    | a     | a           | 1                  |
    | a     | aba         | 3                  |
    | x     | y           | N/A                |
    | world | hello world | 7                  |


Didn't know that. Thanks!


So what would result would you return for:

0-based indexOf "" in "abc"

0-based lastIndexOf "" in "abc"

?


For the empty string, lastIndexOf would return 3, because the empty string is present at all positions. (When taking the substring (i, i) for any index i, the result is the empty string.) This assumes that the index “after the last character” is considered a valid index for that purpose, which it typically is. (Where can I place the cursor within the delimiters of "abc"? There are four positions where I can place it, 0–3.) Otherwise, 2 would be the last position.


>To further illustrate, it would be the index where, if the search string is removed from the found position, it would have to be inserted in order to revert to the original string.

I guess that is a good way to look at it. It would mean:

indexof "" in "abca" is 0

indexof "abca" in "" is invalid

indexof "" in "" is 0

Which feels unintuitive.


If you feel that ‘indexof "" in "" is 0’ is unintuitive, consider that indexOf s in s is zero for all strings s (including for the empty string).

Similarly, indexOf st in s is invalid for all strings s if t is non-empty, that is if s is a proper prefix of the search string. Your example ‘indexof "abca" in "" is invalid’ is just one case of that (with s = empty string and t = "abca"), completely analogous to ‘indexof "babca" in "b" is invalid’.

So I’d say your intuition needs adjusting. ;)


Maybe. ;0)

Your approach seems to be consistent. Unfortunately I think it is too complicated to explain to my non-programmer users.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: