Emoji *already existed* in documented (not to mention frequent) use when they we...

rabidrat · on July 29, 2018

> That's fine. There is plenty of space in the Unicode standard.

But there is not infinite time from a font creator. Already it is impossible to know which of the newer characters will be supported by which OS/fonts/etc. Unicode characters are all but useless if they can't be displayed.

Freak_NL · on July 29, 2018

That's fine too. There is no need for every font to have support for Byzantine Musical Symbols or Coptic Epact Numbers. Operating Systems tend to have a collection of fonts installed that can handle every common use case. If you depend on glyphs introduced in special use areas you are more than likely aware of the need to install a special font.

Sure, you may need to be aware as a developer that a new character introduced in last month's Unicode update isn't likely to be supported yet, but that's no different from any other technology like CSS.

grahamburger · on July 29, 2018

It would be nice if there was a reference set of glyphs. Like if in order to add a character to Unicode you had to also add a default glyph. Then at least if a font doesn't have that character the default could be displayed instead of a black box.

squiggleblaz · on July 30, 2018

When I'm running Linux/X, if the current font doesn't know a character it renders the character using another font on the system that does. So then it's just a matter of including a set of fonts that covers every character, and I think a reasonable attempt has been made to do that - although maybe that's based on the fonts I've chosen to install, since that's something that's moderately important to me.

When I'm running Windows (for work) it only renders characters for the current font, and I get box instead of a character. (Traditionally at least; maybe this has changed? - it's not important to my work so I haven't really investigated, but as a feature it's so useful it's hard to believe it hasn't happened yet.)

But as you can see this isn't a problem with Unicode but the configuration of your system. If it chooses to show you a character from another font, then that's good and convenient.

Note that Windows support for emoji is significantly better. I think Gtk+3/Gnome apps and Firefox have improved significantly this year (wow, lots of color!), but they're still lagging.

rspeer · on July 30, 2018

> So then it's just a matter of including a set of fonts that covers every character

Have you ever tried to literally do this? You won't succeed.

The Google Noto font family is getting there, but I think they're only caught up with like Unicode 6, and we're on Unicode 11 now. There are some recently-added scripts that you won't find in any font. The Unicode tables render them in proprietary fonts that are obfuscated in the PDF, usually without even any information about how to buy them.

nneonneo · on July 30, 2018

The list of font contributors can be found here: http://www.unicode.org/charts/fonts.html. Choice quote:

> The Unicode Consortium currently uses over 390 different fonts to publish the code charts and figures for The Unicode Standard. The overwhelming majority of these fonts are specially tailored for this purpose and have been donated to the Consortium with a restricted license for use in documenting the standard.

You might be able to reach out to the vendors listed and try to buy the font, but it seems the majority were produced only for Unicode's use.

speleo_engr · on July 29, 2018

That doesn't make sense for some complex (shaped) scripts where you have things like zero width joiners.

grahamburger · on July 29, 2018

Are there characters that really can't be drawn individually? Surely there's something better than a black box for pretty much any character.

squiggleblaz · on July 30, 2018

Yes, I disagree that it doesn't make sense. You're not supposed to see a ZWJ, so the correct render of it is not visible. So even if you include it in your font - well, the correct thing for your font engine to do is to apply the semantics of it correctly and change the render of the surrounding characters.

speleo_engr · on July 30, 2018

Yes, but there are glyphs, especially in Indic languages, that have no representation in a presentation form in Unicode. If you are familiar with Arabic, each glyph has forms - isolated, beginning, medial, and final. Your "typical" Arabic string is composed of isolated code points, they go through a text shaper, and out come glyph indices that also have a Unicode code point associated with them. You do the same thing with an Indic language, except imagine that the glyph indices that come out have no corresponding Unicode code points. It's very surprising at first and some software can't handle these unassigned or "hidden" glyphs.

So my point is: how much value would there be in requiring a representation of a small fraction of the glyphs (only those with Unicode code points, many of which are ZWJs) in a script in a standards document when potentially hundreds of forms are necessary after shaping to represent the language?

titanix2 · on July 29, 2018

That’s a big problem with the standard in my opinion: standardization of new characters require submit a font for them but this font do not need to be open. So you have standardized characters that cannot be displayed without someone else implementing another font. That’s a lot of duplicate effort.

zamadatix · on July 29, 2018

I'm not sure have over 100k glyphs in random fonts solves anything. I'm also not convinced fonts samples are something a character encoding should care about or try to standardize.

For practical use Googel's Noto font is under an open font license and covers so many glyphs it's collection is split into an multiple OTF files because of the 65k glyph limit per font. The goal of Noto is much the same as the one you propose - to have an open representation of every character (and in a consistent font).

Dylan16807 · on July 29, 2018

> covers so many glyphs it's collection is split into an multiple OTF files because of the 65k glyph limit per font.

Not because of that. Out of the hundred Noto files, several are the same CJK characters with different country and Sans/Serif/Mono styles, and everything else combined would fit into a single file.

speleo_engr · on July 30, 2018

Even removing CJK, there are still more than 65,535 glyphs necessary to represent everything the the SMP and BMP less CJK. If you look in BMP without CJK, surrogates, and private use areas, you are looking at around 27,000 code points. If you look at the SMP (supplementary multilingual plane), there are around 90 blocks of 4096 code points assigned. That total is well over 65,535. And keep in mind many scripts also require unassigned glyphs which are not Unicode code points themselves. These unassigned glyphs count against the 65,535 TTF limit, though.

https://en.wikipedia.org/wiki/Plane_%28Unicode%29

Dylan16807 · on July 31, 2018

Sure, it would take two files to do everything outside CJK. What I said is still true, that everything covered by the non-CJK Noto fonts would fit in a single file (50k glyphs total).

My point is that you only need 3-4 files to cover Unicode. Noto is not split into a hundred different files for that reason, but for other reasons.

(Also the used space on the SMP is roughly 90 * 256.)

Dylan16807 · on July 29, 2018

People are going to use popular icons whether they're in unicode or not. The burden of making it work is going to exist either way. And something like an emoji doesn't need to be in a general-purpose font.