That is certainly an approach. But wouldn't that make indexOf() and lastIndexof(...

layer8 · on Dec 14, 2023

No, because “the index of a string within a string” would be consistently defined to always be the lower index (or, alternatively, always the upper index). There is no intrinsic relation to the search direction, since all characters of the search string have to be compared regardless of the direction. For example, when searching from the end, the search algorithm could start comparing at index `totalLength - searchStringLength`.

To further illustrate, it would be the index where, if the search string is removed from the found position, it would have to be inserted in order to revert to the original string.

hermitcrab · on Dec 14, 2023

>it would be the index where, if the search string is removed from the found position, it would have to be inserted in order to revert to the original string.

Marinated on this a bit more. I think this the most straightforward and internally consistent way to think about this. Thanks for the insight.

So I figure based on this:

1-based first index of v1 in v2 is:

| v1 | v2 | IndexOf(v1,v2) |

|-------|-------------|----------------|

| [] | [] | 1 |

| aba | [] | N/A |

| [] | aba | 1 |

| a | a | 1 |

| a | aba | 1 |

| x | y | N/A |

| world | hello world | 7 |

Where []=empty.

This is the same as Excel FIND() and differs from Javascript indexOf() (ignoring difference in indexing) only for "".indexOf("") which returns -1 (N/A).

1-based last index of v1 in v2 is:

| v1 | v2 | LastIndexOf(v1,v2) |

|-------|-------------|--------------------|

| [] | [] | 1 |

| aba | [] | N/A |

| [] | aba | 4 |

| a | a | 1 |

| a | aba | 3 |

| x | y | N/A |

| world | hello world | 7 |

This differs from Javascript lastIndexOf() (ignoring difference in indexing) only for "".indexOf("") which returns -1 (N/A).

jmholla · on Dec 14, 2023

If you add four spaces at the beginning of a line, Hacker News will treat it as preformatted. Here are you tables like such just for ease of reading.

---

1-based first index of v1 in v2 is:

    |  v1   |     v2      | IndexOf(v1,v2) |
    |-------|-------------|----------------|
    | []    | []          | 1              |
    | aba   | []          | N/A            |
    | []    | aba         | 1              |
    | a     | a           | 1              |
    | a     | aba         | 1              |
    | x     | y           | N/A            |
    | world | hello world | 7              |

1-based last index of v1 in v2 is:

    |  v1   |     v2      | LastIndexOf(v1,v2) |
    |-------|-------------|--------------------|
    | []    | []          | 1                  |
    | aba   | []          | N/A                |
    | []    | aba         | 4                  |
    | a     | a           | 1                  |
    | a     | aba         | 3                  |
    | x     | y           | N/A                |
    | world | hello world | 7                  |

hermitcrab · on Dec 15, 2023

Didn't know that. Thanks!

hermitcrab · on Dec 14, 2023

So what would result would you return for:

0-based indexOf "" in "abc"

0-based lastIndexOf "" in "abc"

?

layer8 · on Dec 14, 2023

For the empty string, lastIndexOf would return 3, because the empty string is present at all positions. (When taking the substring (i, i) for any index i, the result is the empty string.) This assumes that the index “after the last character” is considered a valid index for that purpose, which it typically is. (Where can I place the cursor within the delimiters of "abc"? There are four positions where I can place it, 0–3.) Otherwise, 2 would be the last position.

hermitcrab · on Dec 14, 2023

>To further illustrate, it would be the index where, if the search string is removed from the found position, it would have to be inserted in order to revert to the original string.

I guess that is a good way to look at it. It would mean:

indexof "" in "abca" is 0

indexof "abca" in "" is invalid

indexof "" in "" is 0

Which feels unintuitive.

layer8 · on Dec 14, 2023

If you feel that ‘indexof "" in "" is 0’ is unintuitive, consider that indexOf s in s is zero for all strings s (including for the empty string).

Similarly, indexOf st in s is invalid for all strings s if t is non-empty, that is if s is a proper prefix of the search string. Your example ‘indexof "abca" in "" is invalid’ is just one case of that (with s = empty string and t = "abca"), completely analogous to ‘indexof "babca" in "b" is invalid’.

So I’d say your intuition needs adjusting. ;)

hermitcrab · on Dec 14, 2023

Maybe. ;0)

Your approach seems to be consistent. Unfortunately I think it is too complicated to explain to my non-programmer users.