Traditionally, character's under Unix were encoded in a locale-specific manner: ISO 8859-1 in Western Europe, ISO 8859-2 in Eastern Europe, EUC-JP in Japan, etc. In the 1990s, there was a major push to get XFree86 (the ancestor of X.Org) to switch to locale-independent UTF-8, lead mainly by Markus Kuhn and Bruno Haible.
The link is to Markus Kuhn's web page, which appears to describe the UTF_8 software available around 1998 or so.
UTF-8 is not locale independent. You cannot correctly render multilingual UTF-8 text without also specifying its locale, and some transformations like uppercase/lowercase also depend on the locale.
Eg: some cjk characters render differently based on whether mainland China, Taiwan, or Japan. One example 骨 (from my old notes so tiny chance this example is incorrect)
> You cannot correctly render multilingual UTF-8 text without also specifying its locale
You can render it pretty well, not perfect, but good enough to actually read it, as opposed to not being able to render it at all or rendering mojibake à la Кракозябры instead.
At least touching Unicode strings in wrong locales only mildly corrupts the strings. Plenty of Win32 apps would crash if the system locale is in UTF-8.
I mean, UTF-8 string handling is language (of the given bitstream, not necessarily the system) dependent, e.g. Turkish lowercase I, Chinese Hanzi vs Japanese Kanji at same codepoints, etc etc...
The encoding itself is locale-independent. Some algorithms (rendering, casing, hyphenation etc.) depend on the locale.
This is unlike the older paradigm, where the encoding itself was dependent on the locale, making things like copy-paste between applications running in different locales problematic.
That's not my experience. Users naturally get frustrated when I break the software that they rely upon, and sometimes they use strong words, but the resulting conversation is almost always friendly and productive. (There are exceptions, of course, but that's life, right?)
Here's a recent sample, paraphrased for brevity:
Them: this is broken.
Me: no, it's not broken.
Them (a few days later): "I think I must not have tried all the combinations", followed with two pages of transcripts.
Me: "I've just checked the code, and you're right [...] I'm extremely sorry I wasted your time."
Them: "Heh, it's all good. I'm am chuffed you're taking the time to give thoughtful responses with me"
It will probably depend on whether NPUs are universally available in smartphones, and whether we get a standard API for accessing NPUs. But I don't know whether AI-based codecs can have battery usage competitive with fixed-function hardware.
Agreed, unless AI progress slows down enough that it becomes reasonable to bake weights into circuitry, the conventional approach will probably remain preferable for encoding, at least on power constrained devices.
> One of the interesting usage of AV1 was specifically for low bitrate calls, and software encoding was perfectly fine, even on mobile.
You really want hardware decoding on mobile, otherwise you end up with 40 minutes battery life. Fortunately, for typical videoconference resolutions, VP8 and H.264 are just fine. AV1 is nice to have, though, due to excellent support for synthetic content (screen sharing), and for scalable video coding (a much more elegant solution than simulcast, IMHO).
In the world I live in, the general plan is to stick to VP8 and H.264 for the time being, and to skip to AV1 when it's universally available on mobile. I haven't seen any features of AV2 which would justify waiting for it.
No, you do NOT want hardware anything on mobile if you are targeting smaller bitrate that are not that taxing on the CPU, when the conditions are otherwise so bad that the call would either drop or be unusable. HW encoders produce bad results at low bitrate. HW decoders usually have issues with the temporal encodings used and they may also just not accept those streams (a lot of test scenarios are movies, and the RTC tools are poorly supported).
The use case is not screensharing or a large conference room, but mainly a simpler talking face for a 1:1 chat, but with good quality as packet loss is then not as impactful on a 30KBps stream with AV1 than a 50KBps VP8 stream.
I'd be interested in learning more, but the links you provide are just advertising copy. Could you please provide links to actual technical articles on your conclusions?
The internals are usually confidential and it's hard to find an engineer willing to make a comprehensive write-up about those: they want to make tech and not spend time proofing a tech write-up for public consumption (they already had to make an internal one!).
So the middle ground is that you have those "marketing" copies that demo the tech. One of the telling part of those is how you can get a fine usable 30KBps stream at very low bitrate with AV1 compared to a higher bitrate H264 that is unusable. It doesn't tell you that because you are using a lot less bytes, you will be trading CPU power consumption for radio power consumption and it's a tricky comparison, but in general, it's a favorable trade for the user who has very bad network conditions and is trying to make a call. The goal is to make the call work at all cost, not to save the battery and having a useless stream of data transferred.
There's a few reasons, I suspect fixed resource depth might be factor in poor hardware single pass encoding ...
What does limit them, though, is pseudo real time single pass pipelines.
I see the best encoding results from two pass - one fast run to work out the easy compress and hard compress parts of a video and then a second pass to get the optimal results on a stream that's already got a budget in mind for each section through the advantage of foresight as to what's left to do.
As someone else said, it's poor single pass encoding performance targeted for the tools used in real-time communications. This type of usage is "new" to hardware manufacturers and they poorly test it as it's easier to make a chip good enough for decoding the general case for watching your favorite movie platform than do something comprehensive.
One aspect of real-time encoding is that the frames are not ordered or structured the same linear way as they used to be in older format. Now, we have temporal and spatial encoding, which allows for better frame drops or efficiency or a stream that is decodable at multiple resolutions at the same time.
An example of temporal encoding is that you have a sequence of frames at 15fps (T0) that are all referencing the previous one, and sometimes an I-frame that is a full independent picture you can start decoding from. Then, you can have another temporal layer (T1) , where for every frame at the base 15 fps layer (T0), you insert a new frame that depends on it. You end up having a 30 fps stream! And if your network connection is worse, or you hardware can't keep up, you can drop the T1 layer and only use the T0 layer. This works great for real-time! In the specs, you could have more layers with more complex dependency chains, but 3 layers is as high as you want to go.
Spatial encoding is a bit different, you will have frame at the highest resolution, but they reference another frame at half the resolution (who may also do the same). Each higher layer means just adding more details over the smaller size frame that you have at the base. To decode an image, you need to have all the frames available. This can also be combined with the temporal encoding above. While this isn't useful for a 1-1 communication, in conference rooms, it's a great optimization as while you may send your full HD picture to the server, you may not want to send that to everyone when you're just a thumbnail who is not actively speaking. So the conference server will not send the full HD picture, but the lower resolution only. And since you don't want to do the encoding on the server (it's expensive, slow and you need to trust the intermediate service with your secret stuff), doing spatial encoding on the client side is better.
Those techniques are all advanced ones that would be used if available universally. Unfortunately, a lot of hardware decoders choke on those, despite being part of the specs. And it's not that they can't generate a stream with those, they also sometimes can't decode them (breaking the spec).
And finally, the hardware encoders are tuned for higher bitrate work. Ask them to do a 3MBps stream, they'll do fine. Ask them for a 30KBps stream, they'll make garbage most of the time.
> On average, proprietary programs are not better than open-source programs, but usually worse, because they are reviewed by fewer people and because frequently the programmers who write them may be stressed by having to meet unrealistic timelines for the projects.
There's also the fact that when you write open-source code, you're writing for a friendly audience. I've often found myself writing the code, letting it rest for a few hours, then rewriting it so that it is easier to read. Sometimes, the code gets substantially rewritten before I push.
There's no cooling period when you write code during your 9-5 job: it works, it has the required test coverage, ship it and move on to the next task.
Alternate app stores are still allowed. It's just that they are restricted to applications signed by developers who have paid a tithe to Google.
Google are obeying the letter of the law, while openly violating its spirit. Perhaps it'll be possible to attack them in court, but it will take years, and by that time they'll have found another trick.
No. The right analogy is an image of the Reichstag with Nazi banners.
reply