Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have an apostrophe in my name and it causes allll sorts of issues like this. (Think Bobby Tables). To the point where I’m pretty convinced that the internet is going to wipe out apostrophes from people’s actual names. In fact I just omitted it in my most recent drivers license.


Ouch. I have a ü in my last name from family in a different country way back somewhere. It doesn't crash systems as much as an apostrophe would, but it's very good at showing encoding issues between systems..

It's not as big an issue as it used to be, at least. Before I've had online transactions failing because of a mismatch between my name (with ü), and the name on the card (with u). The systems seem more forgiving now, having handled that case or something. I also remember being a bit scared traveling to Japan many years ago, as we were told it was SOoo important that the names and everything matched to gain entry. And then the name on my ticket was completely mangled. But no one cared.

Here's a SO post about someone with the last name Null: https://stackoverflow.com/q/4456438/923847


The Japanese are quite used to mojibake [1], so they would've understood immediately that the mismatch between your ticket and passport was caused by encoding issues.

[1] https://en.wikipedia.org/wiki/Mojibake


Interestingly, I've had problems in Korea (Gimpo Airport) because my name contains an "ö", and the canonical spelling in the passport for this is "oe". This was cause for much confusion among the airport staff.

I would have thought that people from CJK-countries were more understanding of encoding-to-latin weirdness than most, but apparently not.


I think their understanding would be focused on the encoding for their language and a relatively narrow set of problems. I've encountered name issues in CJK countries that keep names in native encoding due to an assumption that full names fit within a couple of characters with no need for any spaces or punctuation. Some systems might be designed to be "accommodating" and take even up to 8 or 10 characters! There was one train system where my name had at least four different iterations through the tickets I collected, with different ordering of first and last names and truncating.


In defense of the Korean airport staff, they might have been more accommodating if the "ö" was completely and obviously broken, like "£‡�". Spelling it as "oe" makes it look like there are no encoding issues, in which case strict checking makes more sense.

It's much easier to identify mojibake (they tend to be extremely obvious in CJK encodings) than to remember canonical spellings and other variations in a whole bunch of different languages. Airport staff probably know that "oe" and "œ" are interchangeable, but that's about it.


Diacritics are usually stripped in air travel. In Hungarian we have many letters with diacritics, but it is never a problem that the passport has them and the system doesn't.


> Diacritics are usually stripped

Not in all cases. In Germany and Finland (maybe all EU passports???) ä is spelled ae, ö is spelled oe in the machine readable part (umlauts shown in the "human-readable" part). This is important to know when you need a visa.

For Germans this is not a big problem because it has been like this forever if the umlaut is not available for technical reasons. For Finns this is a problem, because this "transcription" is completely unknown in Finnish. For a couple of weeks now it has been possible to get an electronic visa for Russia on the internet. Reportedly many Finns with an ä in their name (that's not uncommon) dropped the dots when applying for their visa, because an ä is not accepted. At the border they were not allowed to enter, because the machine-readable part of the passport has ae instead.


Good point, I don't know any Hungarians with ü or ö in their name, just á and é.

I do wonder what happens to ű and ő though.


There is an ICAO recommendation. However, it is not unambiguous and of course it's not legally binding. So in the end every country decides what they do. (Possibly there are more multinational agreements e. g. inside EU, but I doubt there is anything truly worldwide.)

https://www.icao.int/publications/Documents/9303_p3_cons_en....

Ü is written as UE, UXX or U

Ű is written as U

According to https://en.wikipedia.org/wiki/Machine-readable_passport#Name... Hungary uses UE for Ü, but there is no reference given. According to the same article Russia uses even 2 different transliteration systems depending on the type of document.


For German names, this is a problem. I have an ü in my name and this is transcribed as a "ue" in my passport. Transcribing it as u would produce a different name (which AFAIK actually exists).


In Hungarian the diacritics are also important, for example Szilasi and Szilási are different and are pronounced differently. Still, it won't be an issue when flying or other stuff.

German is more complicated though with all the substitution rules.

Not to mention Germans who actually have an ue in their name, still pronounced as ü, but written as ue only, never as ü. Or someone may be called Gross, but it would be incorrect to write it as Groß, while someone else's name may be Groß with the acceptable alternative spelling Gross when ß is unavailable.


I too have an apostrophe in my name and experience the same thing. I've had people put it into their system as a comma, dash, space, and all sorts of weirdness despite my calling it out specifically.

My experience has actually improved substantially in the last 10 years or so, and most of the government systems I encounter these days actually handle it properly (as well as handling suffix properly too). That said, I've somewhat recently started having trouble checking in for flights again -- I flew last month and it took the ticketing agent >20 minutes to find my reservation on both the outbound and return flights, even despite my providing the 'confirmation code' / itinerary email (we were checking bags & flying with infants, else I'd have done online check-in).

It can be really frustrating -- though I'm hopeful it will continue improving and hopefully be a smoother experience by the time my kids are adults.


> I've had people put it into their system as a comma, dash, space, and all sorts of weirdness despite my calling it out specifically.

Ugh, yes. And it's insane how many people seem to just NOT KNOW what an apostrophe is.

> checking in for flights

Yea, airlines seem to be one of the worst offenders. I have Precheck but Spirit in particular is never able to match the name on my ticket to the name in the gov't database so I never get it. Just one more reason to avoid flying them I guess.


Out of curiosity, why not just omit the apostrophe for airline reservations then? I understand wanting your full, real name in many circumstances, but who cares about what the boarding pass says as long as you get to fly? I doubt the people checking ID would care about the missing apostrophe.


I often did do this when I used to fly more often domestically, but it tends to cause other issues -- the primary one being a "frequent-flyer/mileage account name mismatch" which means that I have to undertake some manual process to collect my miles. I've lost out on countless 'airline miles' as a result via forgetting to do the manual process within N days after the trip.

Similarly, automated check-in kiosks are then usually unable to find the reservation via credit-card or passport scan -- meaning you're back to looking up the reservation code, and even that often fails, as if the apostrophe just flat-out causes issues with the query/lookup or something.

It can be very frustrating, and I'm increasingly often impressed (and vocalize the same) when I spell my name and the agent enters it correctly AND the system flawlessly handles it, too! The DMV systems in my state are one such example where I used to have issues but, in recent years, the problem appears to have been wholly addressed/handled.


Update your name in the mileage program to take the apostrophe out. They might be able to do it


Practically speaking, that makes sense. Philosophically, it's abhorrent. Blaming the user is bad behavior in general. Expecting a person to alter their name to confirm to a poor software implementation is just wrong.


Well, people with names that are not written with Latin script are coerced into whatever Latin transliteration their government uses when issuing passports. Bonus points for altering the transliteration rules from time to time.


But ID checking is also done electronically at some checkpoints. If your ticket doesn't match your passport, your Visa, your Visa waiver, etc. you are going to be in trouble.

That being said last time I went to the US the person booking the ticket swapped my first name and last name. Only the person at the baggage dropoff noticed it, and after much deliberation they suggested to leave it that way. I went through with no issues apart from not being able to register the mileage.


He shouldn't have to. He is a human being. Computers were made to serve us, not the other way around.


I have a hyphen in my first name that also causes problems. I love it when I put my name in and the web site say "invalid first name." Thanks mom and dad...

What is worse is people who "fix" my name by moving the second half of my first name and making it part of my last name. I'm an adult. I know what my name is.


In Quebec, composite first names (prénoms composés) like Marc-Antoine are pretty common, so there was nothing weird about my parents giving me such a name. And frankly, most webforms I had to fill out while living there accepted my name just fine.

However, now that I've moved to the United States, it's been a bit of an annoyance.


I have a double first name, so I have a space in my first name. Many people / systems seem to think I accidentally put my middle name in the first name field and helpfully move or drop the second part. Putting a hyphen in (which is not really supposed to be there) typically fixes it, so I'm variously known with a hyphenated and non-hyphenated name. But it rarely causes issues.


I write very strongly-worded letters to those companies. Honestly, I wish someone publicly shamed all those stupid companies.


My name is officially spelled as Léon.

The letter e with an acute accent causes all sorts of UTF-8 encoding issues with many services, not just airliners. If you interpret the UTF-8 é (0xC3A9) as ASCII it becomes à (0xC3) + © (0xA9), so my name often comes out as 'Léon'.

Airlines make it worse, because they strip both characters during sanity checking, so my name comes out as 'Lon', which has caused me problems a couple of times as the name on my passport did not match the name on the ticket.


Reminds me of the ode to a shipping label:

http://i.imgur.com/4J7Il0m.jpg

What these things all reinforce is that a lot of programmers take text encoding as a given, and don’t realize all the potential places for errors to sneak in.


Could be a fun way to hunt for buffer overflows on internal shipping services. Just fill out the sender name field to just "óóóóóóóóóóóóóóóóóóóóóóó" and let it expand. If the parcel arrives, not vulnerable. If the packet doesn't arrive, you've found a vulnerability... somewhere...


I wouldn't say an accent "causes" UTF-8 encoding issues. If acute accents are a problem, then UTF-8 handling has completely failed.

It is amazing to me where I see failed encoding like that. For instance, many SEC filings and job ads for tech companies. I mean, I feel like I'm expected to spell things correctly on my resume and emails at work...


> If you interpret the UTF-8 é (0xC3A9) as ASCII it becomes à (0xC3) + © (0xA9)

As latin1 (ISO-8859-1) or Win-1252; ASCII doesn't have either à or ©.

latin1 is the default for text, including HTML, if you don't specify in protocols such as HTTP (modulo some stupidity from the WHATWG where it might be Win-1252 instead) and Windows-1252 is the default encoding in Windows in the USA (at least, prior to the Unicode APIs being added. The old APIs probably still exist though…). So these codecs pop up a lot in places where people who don't know what they're doing end up touching text.


The WHATWG HTML spec requires UTF-8 for conforming documents and scripts [WHATWG 4.2.5.4]. In both HTML specs, charset declarations, if provided, must be UTF-8 [4.2.5].

If the transport, content-type, lack of charset declaration, and sniffing fail to determine an encoding, both specs use defaults based on the configured locale, for English that's windows-1252 [WHATWG: 12.2.3.2 W2C: 8.2.2.2]. latin1/ISO-8859-1 is prohibited. [WHATWG: 12.2.3.3 W3C: 8.2.2.3].


I ran across some code once for descrambling data that had been incorrectly processed like that, which I found common in legal documents. It's an interesting problem, because strictly speaking, it's lossy, but you can use probabilities to figure out something plausible. You can decode/encode one thing as another, or you can decode/encode multiple times...


Any chance you have a link? I’ve had implement solutions to this myself and it’s very tedious. If someone has built a more complete solution I would love to just use that instead


This HN thread has some links and discussion: https://news.ycombinator.com/item?id=16103356


That might be what I'm remembering; then again, I don't really do Python, so maybe it was something else. I doubt it was anything better than the link above, regardless.


You could try inputting your name as [Latin Small Letter E][Combining Acute Accent]:

e◌́ => é

Which should keep the `e` intact, while the combining acute accent (0xCC 0x81) may "only" get converted to a `Ì` which may be stripped. 0x81 is undefined in Windows-1252, so I have no idea what would happen to that, but probably be stripped as well, keeping just Leon.


Unless someone decides to NFC-normalize the text along the way. And it's generally agreed that text should be normalized with NFC, although there is often a fierce debate about who should do it ("not me").


Reminds me of the times when Amazon failed to reproduce the ü in my last name on their shippig labels. They consistently printed the UTF-8 encoded character interpreted as 8 bit ASCII sequence. That bug was present for a couple of years.


"I was not trying to do SQL injection, sir! My name is John Letme'or True"


> the internet is going to wipe out apostrophes from people’s actual names

Also seeing that with accentuated uppercase letters in French, even in nouns, because it's hard to type them on Windows.

People still use accents in lowercase of course, but think that it's incorrect to use accents for uppercase letters, even when handwriting.


I only learned they do have the accents from your post. I was taught to omit them about a decade ago (as a second language).


It is obligatory in Spanish -using accents both in lower and upper cases...


I did some research a few weeks back about why I have an apostrophe in my name. When the British conquered the Irish they started keeping records of the citizenry. The Ó used in Irish names to track descent was eschewed for O' in British record keeping.

https://en.wikipedia.org/wiki/Irish_name


Holy crap- still? This was an issue on HN like 10 years ago or so and I thought the word got out to fix it.


It was also an issue 10 years before that. You'd think the word would get out to fix it...


Just because the engineers know that there's a tricky problem with input validation doesn't mean the business people want to take the time to solve it, unfortunately.


For those unfamiliar with Bobby:

https://xkcd.com/327/


D'Von?


Nah, pretty common Irish last name




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: