"Distillation and amplification" is a somewhat popluar AI technique. For example...

feoren · on Feb 22, 2023

Chess is "data mineable" -- you can get more training data just by having computers play chess against themselves. There's clear winners and losers. If you programmed in the rules of chess, a sufficiently powerful AI could learn everything there is to know about the game just by playing itself -- mining its own training data.

There's no analogue with language. The system can't determine whether what was said makes sense or is true on its own. Maybe you could program in "the rules of grammar" and have AIs invent their own languages, but they'd have nothing to say to each other, so don't expect a translation for "a broken clock is right twice a day". Besides, that's not what anyone is doing.

This is why I'm saying any technique like this that works, must work by "squeezing out" more information from the training data (very likely overfitting in the process). You simply cannot data-mine new useful language training data like you can data-mine bitcoin or 1v1 game data.

> for it to work well you need some kind of reward function that doesn't depend on the model you train. Training on LLM texts that humans conciously chose to publish might already provide that

Of course adding more human-curated data can improve the model. But the whole idea of the arxiv article is whether these AIs can improve themselves. It seems patently clear to me that the answer is "only if they're underfit to the data, and only to a limit, after which they will start to overfit on their own excrement". I really just don't see how there's any other possibility that doesn't rely on ChatGPT magically having actually reached the singularity.

Look, even humans don't get perpetually more intelligent just by talking to themselves. After a certain point, all they get is more entrenched in bad ideas, more cult-like, more superstitious. Humans get more intelligent by interacting with the environment and seeing what happens.