Python has str.casefold() for caseless comparisons that handles the example in t...

mjn · on Nov 22, 2021

I believe this does work for German (or at least I can't think of an example where it doesn't). But a case I can think of where it doesn't work is with standard modern Greek. In all-caps words, accents are omitted, while at least Python's implementation of casefold produces non-equal strings for the all-caps and lowercase versions:

    >>> str.casefold("παράδειγμα")
    'παράδειγμα'

    >>> str.casefold("ΠΑΡΑΔΕΙΓΜΑ")
    'παραδειγμα'

It's at least consistent with this, though:

    >>> str.upper("παράδειγμα")
    'ΠΑΡΆΔΕΙΓΜΑ'

But I'd consider that incorrect, or at least nonstandard. The Greek alphabet does have accented versions of capital letters, but they can only be used as the first letter of a word in mixed case (e.g. if a sentence starts with έλα, you write it Έλα), never in the middle of a capitalized word. However maybe this slides too far to the "language" rather than "encoding" side of the space that Unicode considers outside of its purview.

zvr · on Nov 22, 2021

Yes, the Python standard library (like many other implementations) chose to do the "easy" way instead of the "correct" one.

Correct case-folding in Greek is complicated, since it might introduce diaeresis in the next vowel, if the accents are removed: Μάιος - ΜΑΪΟΣ.

All this means that correct handling is not reversible, which introduces a slew of other problems.