Doesn't look as bad as I expected tbh.
Sure some stuff could be better but I've seen much shittier vibe coded projects (including my own). I'd be more interested in their workflows and testing pipeline though. They ship pretty often but Boris still says he has 10+ PRs a day. I would be really curious what triggers a release, since it doesn't seem like every PR is released. I'm also curious how large their PRs really are.
There is a big difference between:
> Build plugins
and:
> Add 3px padding in line 5
if you claim "No code is written by humans anymore"
Yes and even now if you tell the LLM any private information inside the sandbox it can now leak that if it gets misdirected/prompt injected.
So there isn't really a way to avoid this trade-off you can either have a useless agent with no info and no access. Or a useful agent that then is incredibly risky to use as it might go rogue any moment.
Sure you can slightly choose where on the scale you want to be but any usefulness inherently means it's also risky if you run LLMs async without supervision.
The only absolutely safe way to give access and info to an agent is with manual approvals for anything it does. Which gives you review fatigue in minutes.
FWIW I reported your post to the mods because it reads completely AI generated to me. My judgement was that it might have been slightly edited but is largely verbatim LLM output.
Some tells that you might wanna look at in your writing, if you truly did write it yourself without Any LLM input are these contrarian/pivoting statements. Your post is full of these and it is imo the most classic LLM writing tell atm. These are mostly variants of the 'Its not X but Y" theme:
- "Not whether they've adopted every tool, but whether they're curious"
- "I still drive the intuition. The agents just execute at a speed I never could alone."
- "The model doesn't save you from bad decisions. It just helps you make them faster."
- "That foundation isn't decoration. It's the reason the AI is useful to me in the first place."
- "That's not prompting. That's engineering"
It is also telling that the reader basically cant take a breather most of the sentences try to emphasize harder than the last one. There is no fluff thought, no getting side tracked. It reads unnatural, humans do not think like this usually.
When did they stop putting competitor models on the comparison table btw?
And yeh I mean the benchmark improvements are meh. Context Window and lack of real memory is still an issue.
Yeh there is a skyrim mod that lets you talk to any NPC and basically queries an LLM behind the scenes even with a screenshot of the scene so it can react to your clothing etc. if you insult it the LLM understands it dynamically and makes the NPC attack.
I mean thats not a new game concept but I definitely think that levels up the experience.
In september he burned through 3000$ in API credits though, but I think that's before we finally bought max plans for everyone that wanted it.
reply