"Distillation and amplification" is a somewhat popluar AI technique. For example if you have a chess engine with a heuristic to choose which moves to investigate, you can explore the 20 best paths according to your heuristic, see which moves ended in the best result, and use that to train the heuristic for the first move.
Doing the same thing with LLMs isn't out of the question, but for it to work well you need some kind of reward function that doesn't depend on the model you train. Training on LLM texts that humans conciously chose to publish might already provide that, you just have to somehow filter out the content farms that lack any human review.
Chess is "data mineable" -- you can get more training data just by having computers play chess against themselves. There's clear winners and losers. If you programmed in the rules of chess, a sufficiently powerful AI could learn everything there is to know about the game just by playing itself -- mining its own training data.
There's no analogue with language. The system can't determine whether what was said makes sense or is true on its own. Maybe you could program in "the rules of grammar" and have AIs invent their own languages, but they'd have nothing to say to each other, so don't expect a translation for "a broken clock is right twice a day". Besides, that's not what anyone is doing.
This is why I'm saying any technique like this that works, must work by "squeezing out" more information from the training data (very likely overfitting in the process). You simply cannot data-mine new useful language training data like you can data-mine bitcoin or 1v1 game data.
> for it to work well you need some kind of reward function that doesn't depend on the model you train. Training on LLM texts that humans conciously chose to publish might already provide that
Of course adding more human-curated data can improve the model. But the whole idea of the arxiv article is whether these AIs can improve themselves. It seems patently clear to me that the answer is "only if they're underfit to the data, and only to a limit, after which they will start to overfit on their own excrement". I really just don't see how there's any other possibility that doesn't rely on ChatGPT magically having actually reached the singularity.
Look, even humans don't get perpetually more intelligent just by talking to themselves. After a certain point, all they get is more entrenched in bad ideas, more cult-like, more superstitious. Humans get more intelligent by interacting with the environment and seeing what happens.
Doing the same thing with LLMs isn't out of the question, but for it to work well you need some kind of reward function that doesn't depend on the model you train. Training on LLM texts that humans conciously chose to publish might already provide that, you just have to somehow filter out the content farms that lack any human review.