It helps but a LLM could still code a destructive command (like inlined python -c scripts) you can't parse by rules and regex, or a gatekeeper LLM be able to understand its implication reliably. My solution is sandbox + git, where the .git folder is write protected in the sandbox as well as any outside files being r/o too.
My personal anecdata is that both cases when Claude destroyed work it was data inside the project being worked on, and not matching any of the generic rules. Both could have been prevented by keeping git clean, which I didn't.
nah does classify python -c as lang_exec = ask, and the optional LLM layer sees the actual code, but it's not bulletproof. Keeping a clean working tree is probably the single best defense regardless of tooling.
> My only caution is that good writers and LLMs look very similar, because LLMs were trained on a corpus of good writers.
People moving to careless writing for authenticity while good writing will be considered AI? funny. We want authentic human thought but can only detect human style.
> Whenever I see claims about AGI being reachable through large language models, it reminds me of the miasma theory of disease.
Whenever I see people think the model architecture matters much, I think they have a magical view of AI. Progress comes from high quality data, the models are good as they are now. Of course you can still improve the models, but you get much more upside from data, or even better - from interactive environments. The path to AGI is not based on pure thinking, it's based on scaling interaction.
To remain in the same miasma theory of disease analogy, if you think architecture is the key, then look at how humans dealt with pandemics... Black Death in the 14th century killed half of Europe, and none could think of the germ theory of disease. Think about it - it was as desperate a situation as it gets, and none had the simple spark to keep hygiene.
The fact is we are also not smart from the brain alone, we are smart from our experience. Interaction and environment are the scaffolds of intelligence, not the model. For example 1B users do more for an AI company than a better model, they act like human in the loop curators of LLM work.
If I'm understanding you, it seems like you're struck by hindsight bias. No one knew the miasma theory was wrong... it could have been right! Only with hindsight can we say it was wrong. Seems like we're in the same situation with LLMs and AGI.
The miasma theory of disease was "not even wrong" in the sense that it was formulated before we even had the modern scientific method to define the criteria for a theory in the first place. And it was sort of accidentally correct in that some non-infectious diseases are caused by airborne toxins.
Plenty of scientific authorities believed in it through the 19th century, and they didn't blindly believe it: it had good arguments for it, and intelligent people weighed the pros and cons of it and often ended up on the side of miasma over contagionism. William Farr was no idiot, and he had sophisticated statistical arguments for it. And, as evidence that it was a scientific theory, it was abandoned by its proponents once contagionism had more evidence on its side.
It's only with hindsight that we think contagionism is obviously correct.
It really depends what you mean by 'we'. Laymen? Maybe. But people said it was wrong at the time with perfectly good reasoning. It might not have been accessible to the average person, but that's hardly to say that only hindsight could reveal the correct answer.
It's unintuitive to me that architecture doesn't matter - deep learning models, for all their impressive capabilities, are still deficient compared to human learners as far as generalisation, online learning, representational simplicity and data efficiency are concerned.
Just because RNNs and Transformers both work with enormous datasets doesn't mean that architecture/algorithm is irrelevant, it just suggests that they share underlying primitives. But those primitives may not be the right ones for 'AGI'.
> Of course you can still improve the models, but you get much more upside from data, or even better - from interactive environments.
I'm on the contrary believe that the hunt for better data is an attempt to climb the local hill and be stuck there without reaching the global maximum. Interactive environments are good, they can help, but it is just one of possible ways to learn about causality. Is it the best way? I don't think so, it is the easier way: just throw money at the problem and eventually you'll get something that you'll claim to be the goal you chased all this time. And yes, it will have something in it you will be able to call "causal inference" in your marketing.
But current models are notoriously difficult to teach. They eat enormous amount of training data, a human needs much less. They eat enormous amount of energy to train, a human needs much less. It means that the very approach is deficient. It should be possible to do the same with the tiny fraction of data and money.
> The fact is we are also not smart from the brain alone, we are smart from our experience. Interaction and environment are the scaffolds of intelligence, not the model.
Well, I learned English almost all the way to B2 by reading books. I was too lazy to use a dictionary most of the time, so it was not interactive: I didn't interact even with dictionary, I was just reading books. How many books I've read to get to B2? ~10 or so. Well, I read a lot of English in Internet too, and watched some movies. But lets multiply 10 books by 10. Strictly speaking it was not B2, I was almost completely unable to produce English and my pronunciation was not just bad, it was worse. Even now I stumble sometimes on words I cannot pronounce. Like I know the words and I mentally constructed a sentence with it, but I cannot say it, because I don't know how. So to pass B2 I spent some time practicing speech, listening and writing. And learning some stupid topic like "travel" to have a vocabulary to talk about them in length.
How many books does LLM need to consume to get to B2 in a language unknown to it? How many audio records it needs to consume? Life wouldn't be enough for me to read and/or listen so much.
If there was a human who needed to consume as much information as LLM to learn, they would be the stupidest person in all the history of the humanity.
>With only instructional materials (a 500-page reference grammar, a dictionary, and ≈400 extra parallel sentences) all provided in context, Gemini 1.5 Pro and Gemini 1.5 Flash are capable of learning to translate from English to Kalamang— a Papuan language with fewer than 200 speakers and therefore almost no online presence—with quality similar to a person who learned from the same materials
I'm not entirely sure, that I totally convinced, but yeah, it is better than me. I mean, I could do the same, but it would take me ages to go through 500 pages and to use them for the actual translation.
I'm not sure, because Gemini knows a lot of languages. The third language is easier to learn than the second one, I suppose 100th language is even easier? But still Gemini do better, than I believed.
Are you asking how many books a large language model would need to read to learn a new language if it was only trained on a different language? probably just 1 (the dictionary)
Luck. RNNs can do it just as good, Mamba, S4, etc - for a given budget of compute and data. The larger the model the less architecture makes a difference. It will learn in any of the 10,000 variations that have been tried, and come about 10-15% close to the best. What you need is a data loop, or a data source of exceptional quality and size, data has more leverage. Architecture games reflect more on efficiency, some method can be 10x more efficient than another.
That's not how I read the transformer stuff around the time it was coming out: they had concrete hypotheses that made sense, not just random attempts at striking it lucky. In other words, they called their shots in advance.
I'm not aware that we have notably different data sources before or after transformers, so what confounding event are you suggesting transformers 'lucked' in to being contemporaneous with?
Also, why are we seeing diminishing returns if only the data matters. Are we running out of data?
The premise is wrong, we are not seeing diminishing returns. By basically any metric that has a ratio scale, AI progress is accelerating, not slowing down.
The METR time-horizon benchmark shows steady exponential growth. The frontier lab revenue has been growing exponentially from basically the moment they had any revenues. (The latter has confounding factors. For example it doesn't just depend on the quality of the model but on the quality of the apps and products using the model. But the model quality is still the main component, the products seem to pop into existence the moment the necessary model capabilities exist.)
Note we're in a sub-thread about whether 'only data matters, not architecture', so I don't disagree that functionality or revenue are growing _in general_, but that's not we're talking about here.
The point is that core model architectures don't just keep scaling without modification. MoE, inference-time, RAG, etc. are all modifications that aren't 'just use more data to get better results'.
I did something similar - I put 18 years of comments on reddit, HN, slashdot, and 3 years of LLM chats in the system. I ended with a similar conclusion - it was less useful than I expected. My intent was to do RAG over my corpus, have a LLM get direct access to what I commented over the years, but unfortunately this much information has a negative effect on LLM creativity. Its responses started to fall in line too much with my ideas and it lost its spark. In the end my conclusion was that all that data was facing towards the past while I desire LLMs to improve in the other temporal direction.
I did the same but with GPT embeddings. My primary problem was different though. I wanted to find when I talked about a related subject somewhere. Search works really well.
I thought it was at most a monetary fine, do people go to jail for copyright infringement? But you seem to want to own all the air around your work, the ground beneath it too. Nothing can exist around it, so a creative person would do better to avert their eyes rather than loading useless ideas. Why should I install in my brain your "furniture" when I am not allowed to sit on it? In these cases I think authors provide a net negative to society by creating more works that further forbid others from creating in the same space.
Here, for example, any comment is open to read and respond to. On ArXiv any paper can be downloaded, read and cited. Wikipedia contains text from many thousands of editors, building on each other. We like collaboration more than asserting our exclusivity rights. That is why these places provide better quality than work for direct profit or, God forbid, ad revenue, that is where the slop starts flowing.
Your project, while interesting as an approach, is orders of magnitude more complex than the proposition here - which is to rely on agents skills with file systems, bash, python, sed, grep and other cli tools to find and organize data, but also maintain their own skills and memories. LLMs have gained excellent capabilities with files and can generate code on the fly to process them. It's people realizing that you can use a coding agent for any cognitive work, and it's better since you own the file system while easily swapping the model or harness.
I personally use a graph like format but organized like a simple text file, each node prefixed with [id] and inline referencing other nodes by [id], this works well with replace, diff, git and is navigable at larger scales without reading everything. Every time I start work I have the agent read it, and at the end update it. This ensures continuity over weeks and months of work. This is my take on file system as memory - make it a graph of nodes, but keep it simple - a flat text file, don't prescribe structure, just node size. It grows organically as needed, I once got one to 500 nodes.
It ends up being similar to how early PC software was written before people realized malicious software could be running. There used to be little to no memory safety between running programs, and this treatment of files as the contextual running memory is similar. It's a great idea until a security perspective is factored in. It will need to end up being very much like closed applications and their of writing proprietary files, which will need some security layer that is not there yet.
AI work creates surplus, we eat that away by specializing and becoming more dependent. Work doesn't stop, it becomes higher stakes, we depend on each other and AI now.
> I've spent decades building up and accumulating expert knowledge and now that has been massively devalued. Any idiot can now prompt their way to the same software.
Do you like the craft of programming more than the outcomes? Now you are in a better position than ever to achieve things.
> It is there to reduce our agency, to make it easier to fire us, to put us in even more precarious position, to suck even more wealth from those that have little to those that have a lot.
You could say this is the story of society, it makes us dependent on each other, reduces our agency, puts us in precarious positions (like WW2). But nobody would argue against society like that.
What happens here is that we become empowered by AI and gain some advantages which we immediately use and become dependent on, eventually not being able to function without them - like computers and even thermostats.
Does anyone think how would economy operate without thermostats? No fridges, no data centers, no engines... they all need thermostats. We have lost some freedom by depending on them. But also gained.
My personal anecdata is that both cases when Claude destroyed work it was data inside the project being worked on, and not matching any of the generic rules. Both could have been prevented by keeping git clean, which I didn't.
reply