It simply means the tokenizer's training corpus may have included a massive amount of German literature or accidentally oversampled a web page where that word was frequently repeated. Look up "glitch tokens" to learn more.
AI can one shot problems too, if they have the necessary tools in their training data, or have the right thing in context, or have access to tools to search relevant data. Not all AI solutions are iterative, trial and error.
Also
> humans are a lot better at (...)
That's maybe true in 2026, but it's hard to make statements about "AI" in a field that is advancing so quickly. For most of 2025 for example, AI doing math like this wouldn't even be possible
Walmart employs this amount of workers only because it is subsided by food stamps and other government assistance. The minute they were forced to actually pay for the labor they employ would fire a lot of people
> Walmart employs this amount of workers only because it is subsided by food stamps
And then those food stamps are used at Walmart, its a win win for Walmart and Walmart. No other country gives their poor food stamps instead of money, I wonder why?
Scott Aaronson wrote a bit about the following thought [0]. If copying a brain and simulating reality ala The Matrix is possible at all, then if you get your brain copied you live one biological live but your copies have an unbounded number of existences (millions? billions? trillions?)
So, if copying brains is possible, and you don't know which version of you you are, you might have odds of, say, 1 to 1 trillion to be living your first, biological live.
Which is to say, if copying brains is possible, you are likely to be running in a simulation already.
[0] there's multiple links and I can't find where I first read, but I found this one from 2024, https://scottaaronson.blog/?p=7774 and uhh.. turns out the argument isn't from him personally (and he doesn't even believe on it), and is best presented here https://simulation-argument.com/ (though it's presented very differently so idk)
Indeed, how do they deal with Chinese? Are some ideograms multiple tokens?
reply