@hollowturtle I'm surprised - do you really find that sota models aren't good enough to generate production code with steering and babysitting? My experience (Claude Code, mostly Opus 4.6) is that it's fantastic at this. At least in JS + TS + Elixir + Ruby. It does indeed need babysitting, my mental model is that it's an exoskeleton not a junior dev, but IME it's a friggin badass exoskeleton, easily 10x-ing my speed on most work. Notably I do NOT --dangerously-skip-permissions nor use claude code's auto mode, I micromanage and lightly review every line it's writing as it writes it, so I rarely have more than 2 sessions generating simultaneously. I suspect that a lot of the disappointment comes in when people try to delegate to it and trust it to not go off the rails. It hasn't earned that trust from me yet (and hasn't needed to yet).
Granted, I'm mostly working in small-to-medium codebases, 20k-30k LOC incl test suite. I wonder if that's a factor in my positive experience. Curious to hear your thoughts.
It really depends on the task, but, in my experience, small to medium and bigger codebases, the amount of steering to get quality code is not worth it.
I see patterns and solutions emerging from hand coding, I'm not the other way around, I can't start with a prompt, unless again I have the feeling that the task can be one-shot with minimumn effort and context.
Starting with a prompt, or in plan mode, it's not how I trained as an engineer, I cannot foresee what something should be/look like until I explore it myself with code I can relate to, that I'm connected with and that I fully understand, for example my muscle memory suggest me to use a specific data structure only after I see some code patterns emerging, hard to explain hopefully makes sense.
If I ask the agent to do that initial exploring, even with a tremendous amount of instructions, guidelines etc. it usually start with a path I wouldn't have started with. What I tried in such cases is to stop it, correct it and generate again, only to end up with more prompt words than lines of code. This is true for every visual task I'm working on (I program non web UIs). Let alone doing it via spec files, if it's something I don't care about yeah sure, maybe a little tool for entering/editing data, but alas it always default to slop web apps, and I get it I mean most of the training set is on web apps
Probably where the mismatch is in this discussion. The measure of what is quality code is all over the place. For some, some form of "good enough" is quality. And for others, metrics like terseness, readability, vacuous amounts of comments, cleverness, various fuzzy measures of "idiomatic", etc, make "quality code" much more of a moving target.
I think this depends a lot on the task, the existing codebase, and the taste of the operator.
In general I tend to agree with you if you're talking a codebase you are deeply familiar with, the value-add from have agents write the code probably ranges from very small to negative in most cases.
On the other hand if you're trying to make changes in systems you are not familiar with, LLMs are a huge speed boost to folks with enough experience to sniff out what would be a bad path essentially via socratic method to the agent.
Obviously there are no silver bullets and no substitute for judgment. I will say though, I'll tradeoff ugly local code for good data models and interfaces any day of the week, and there is definitely an archetype of engineer that is very precious about code without good judgment on where it matters and where it doesn't.
I think his agenda / point is that, viewed from Lindy's Law, given the SOTA in 2026, superintelligent AI arriving soon is vastly more probable than not, right? To make the case that "sure, AI capability and intelligence have grown exponentially over the past several years, but don't worry, they're about to abruptly level off and in fact won't blatantly surpass human-level intelligence within the coming decades" seems to have a high burden of proof unless your model is less "sigmoid" and more "abrupt plateau".
>I think his agenda / point is that, viewed from Lindy's Law, given the SOTA in 2026, superintelligent AI arriving soon is vastly more probable than not, right
Why would that be? Nothing about Lindy's Law makes that promise. And even the SOTA in 2026 is over-estimated thanks to a trillion dollar industry trusted to not influence benchmarks.
I don't buy the math here because it seems to only model half of what AI coding agents do. The entire argument treats AI as a code-generation accelerant -- more output, therefore more maintenance burden, therefore compounding debt. But in my experience (solo dev, ~30k LOC apps), Claude Code has decimated my maintenance costs. I throw broken tests at it. I use it to diagnose bugs, trace data flows, reason through unfamiliar code, and refactor when things get unwieldy. AI isn't just a faster typist -- it's a faster debugger, reader, refactorer. Modeling AI's impact on codebase growth without modeling its impact on maintenance speed seems like a very selective way to model the future. The maintenance cost curves cited here come from pre-AI dev data; using them to predict post-AI outcomes assumes the answer to the most important question (does AI reduce per-line maintenance cost?) rather than investigating it directly. Nobody has nine years of data on this because halfway-decent coding agents have existed for < 6 months. I like the cautionary advice -- watch out for how much maintenance burden you're incurring with all that delicious AI code slop, folks -- but I don't think his confident quantitative predictions ("gains erased after 5 months") are justified. Am I missing something obvious here?
Yeah, it seems like the article should have qualified the issue more or been more precise. Instead of "You Need AI That Reduces Maintenance Costs", something like "Your Use of AI Should Reduce Maintenance Costs".
Some of the maintenance costs you mentioned are primarily read-only, slam dunk AI use cases. Input from AI to diagnose bugs, trace data flows, and help with reasoning. Tests are something of a gray area in the sense that they are not read-only but they don't affect the logic of the app itself.
The "write" use cases (you mention refactoring and the author seems to primarily focus on writing code) is where the author's point seems to be primarily aimed at.
Definitely agree on the read-only improvements to maintenance. Those are unquestionable slam dunk, high value improvements.
> In 5 years consumer chips and model inference will be so good you won't need a server for SOTA.
Naw man, you crazy. If you tell me that in 5 years, consumer chips will be so good that I can run GPT-5.4-level AI on my phone, I'd find that plausible (I buy cheap phones). If you're telling me that in 5 years we won't need _servers_ because our _phones and/or desktops_ will be powerful enough to run the biggest newest LLMs in existence, I question your judgment, I think that prediction shows a deep uncreativity about how massively compute-hungry SOTA models will get.
The valuable things to do with inference will keep being a server niche because they'll keep being 1-2 OOM more compute-hungry than whatever consumer hardware can handle. Like gaming: my laptop can run games from 2015 at max settings no problem but the games actually worth getting excited about in 2026 still melt a $2k GPU, because whatever headroom the hardware gains, developers immediately spend on ray tracing and Nanite and modelling individual skin cells or whatever. I don't see any plausible reason to expect that the ceiling on "valuable server-side compute" or "inference capacity" will rise any more slowly than the on-device capability is rising.
My assumption is that in 2031, SOTA top-intelligence AI will be hosted on cloud servers like it is today, offering dirt-cheap access to capabilities we can't even dream of today, while your Android will be running some open-source GPT-5+ equivalent.
The thing is SOTA has a plateau. All LLMs work on the same principle: input goes in for training, reinforced by humans. There is only so much input (all recorded human knowledge), only so many human tweaks, that can produce only so much increased signal-to-noise in output. The machine can't read your mind, and there is no one truthful answer to most questions, so there will always be a limit on how accurate or correct or whatever any response will get. So at some point, you just can't make a better response. The agent harness, prompts, etc, are the only way to get better, and that's gonna be open source.
Add to that the algorithmic improvements on inference that's making inference faster with more context and higher quality. TurboQuant is just one example, more methods are coming out all the time. So the inference is getting more efficient.
At the same time, hardware can kind of keep getting infinitely better. Even if you can't make it smaller, you can make it more energy efficient, improve multitasking, more GPU cores/RAM or iGPUs, pack in more chips, improve cooling, use new materials... the sky's the limit.
Add all 3 together and at some point you will get Opus 4.7 on a phone with 40 t/s. At that point there's no way I'm paying for inference on a server. You can do RAG on-device, and image/video/voice is done by multi-modals. I want my agent chats replicated, but that's Google Drive. I want the agent to search the web, but that's Google Search. So eventually we're back to just doing what we do today (pre-AI) only with more automation.
The really advanced shit will come in 10 years, when we finally crack real memory and learning. That will absolutely be locked up in the cloud. But that's not an LLM, it's something else entirely. (slight caveat that WW3 will delay progress by 10-20 years)
AFAICT this isn't how SOTA has worked, ever, since the term was invented. So far (again AFAICT) it's always been: Centralized highly-resourced nodes can deliver more technically impressive results, whereas cheaper lower-resource consumer hardware continually lags it. Your premise that "SOTA has a plateau" needs data; you're giving me some juicy plausible hypotheses about reasons why advances might hit a wall, but technology advances tend to find ways around those walls, do you disagree?
The history of computing is full of predictions that consumer hardware would catch up to server-class capability in X years, and the answer has consistently been, consumer hardware catches up to _yesterday's_ server capability while server capability has moved on to new more mind-blowing paradigms which would not be possible on consumer hardware for another half-decade or more.
I'm sure that specific scaling trajectories will hit specific ceilings, such that in specific ways, one can make the argument that (for example) today's iphone performs at parity with today's servers. In 5 minutes I can spin up the same Postgres or Mongo DB that the largest companies on earth use server-side, though I can't support anywhere near the same data & traffic volume. But parity along specific technical aspects is a very different matter from the broad prediction of "you won't need a server for SOTA".
To step back to the bigger context -- your original point seems more along the lines of "we're obviously in an unsustainable bubble, and the rapid progress in on-device AI will further exacerbate the embarrassing collapse of all these overhyped AI companies". I strongly agree with you. But I think that's likely _and also_ firmly predict that the technical SOTA of 2031 (and 2041, if we make it there), in nearly every imaginable aspect including language-capable AI, will be vastly more capable than what you can run in your pocket.
I'm sorry but this attitude baffles me, and I think it's the sort of thing that will sound so silly in 20 years that we'll have collectively memory-holed it. If you're turned off from listening to Spotify recoms becausue they _might_ be AI and you _might_ not know, what does that say to you about the disconnect between your aesthetic judgment and your values?
If you're listening to Spotify autoplays and a shitty song comes up, skip it. If AI slop is flooding Spotify with shitty songs, they'll naturally fail algorithmically (assuming we trust Spotify to actually be honest about its algos, which I'll admit we shouldn't https://substack.com/@tedgioia/note/c-236242253)
If you're listening to Spotify autoplays and a catchy impressive song comes up, what you do is you _listen to it_ and you _fucking enjoy it_. This knee-jerk disgust reaction of "ugh I worry that it's AI" has no place in your heart in that moment. You're just sitting listening to your plastic-and-rare-earth earbuds reproduce digitized waveforms and paying attention to what the music evokes in you. It seems ridiculous to me that we get distracted by questions about "but what if this music isn't made by a human". Insofar as you're a music-enjoyer, listening to music, the only question should be _is it good_. It shouldn't matter if it was created by duck or slug.
The _economic fairness_ aspect is another matter and I don't have as strong opinions there. I think we should ideally incentivize people who use AI in generating their music to disclose their usage, though I have no idea if it's possible to do so, so that consumers who care about only supporting human artists with their listenship-stats can filter to that group. And certainly anyone who closely imitates _a specific artist_, crossing the line from "inspired by" and "shamelessly ripping off", should be severely disincentivized from doing so, whether they used AI or not.
To a vegetarian: "just think about how it tastes, don't worry if there's meat in it!"
Really this stuff is accelerating a conflict between two philosophies of life:
- one where neural network A (electric) produces a set of stimuli for neural network B (meat), which in turn causes the meat to press buttons to maximize the stimuli received;
- one where humans seek meaning in the world and connection with other humans.
Now, the second is losing, and has been since the decline in philosophical dualism across the 20th century; but it can still express the concepts of "important" and "meaningful", which have no place in the first worldview at all.
> The _economic fairness_ aspect is another matter
More plainly, as soon as I read the headline about one AI occupying 11 top slots I thought it was obviously being gamed by listen-botting. I don't really know how a system where machines "listen" to other machines in order to extract a small revenue from defrauded advertisers is sustainable, but there it is.
This feels like a false dichotomy to me. You can find meaning and connect with other humans over AI music no differently than you can over music written by a celebrity. (And vice-versa, you can listen to music written by a human and just enjoy it for the sound without finding any particular sense of meaning or connection.)
Different people have very different relationships to art (in this case, music). For me, the aspects of communication and empathy are key: I think of a song as a message from a particular person at a particular time trying to get across a particular feeling, message, etc.
There is nothing wrong with your approach to music appreciation (removing the author entirely and appreciating it as an isolated work), but it's worth recognizing that a lot of people have different values from you here and their preferred mode of music appreciation is equally valid.
I see your point, and think its valid, but here is a counter:
Content is graded on both instant appeal (e.g. rotten tomatoes "popcornmeter") and artistic appeal (e.g. rotten tomatoes "tomatometer").
I firmly believe that AI generated content cannot have any artistic appeal, because I believe art is fundamentally an invocation of human expression. This might be fine in some contexts, but in general I'd prefer consuming content from groups that I trust to strike a good balance between these types of appeal (e.g. A24 movies).
> Content is graded on both instant appeal (e.g. rotten tomatoes "popcornmeter") and artistic appeal (e.g. rotten tomatoes "tomatometer").
I understand the distinction, but I don’t find the examples compelling. The difference between the popcorn and tomato meters, as I understand it, is just the source. The latter are critics’ opinions while the former are “regular people” opinions. Professional critics may have some concern for the artistic value of a movie, but their job is to help you decide “should you spend your time with this” and the entertainment value is a primary consideration. Furthermore, a critic can have early access and needs to write their review fast. An audience member, who has no such obligation, can let it ruminate and have their opinions evolve. In that sense, a critic’s opinion may be more influenced by initial appeal.
If you value art as communication with another human soul, then it matters whether a human was involved.
"Who cares if that 'I love you' voicemail is really from your mother. As long as it sounds like your mother, it should give you the same warm feeling."
Man this is my current bug bear. Last year I wanted to go offline for all my music. So many of these sites selling mp3s complain about how Spotify pays a fraction of a cent per stream and I should buy music, and I agree.
But then they sell the mp3 albums at $15, more than the physical version ever was. Come on, there has to be a middle ground. At $5 each I'll buy 20, at $15 maybe 1
This "human connection" is over rated IMHO. We tend to create an image of what a human musician is like and we forget that they are, well, human. Too often human musicians have disappointed fans because of their lifestyle (the expression sex, drugs, and rock'n'roll exists for a reason) or because some do not agree with the social and political causes the musicians support. Occasionally, fans follow an artist for their commitment to their art later to discover they sell out in some way, like their style changes to achieve greater mass appeal, or the sell their work to become a jingle for sugar pops or similar.
I think it is best to appreciate their creation and admit the person creating it may not be someone to place undue adulation on. To quote a film, I think it is best to "separate the art from the artist".
when you think there's an AI behind what you are enjoying, it destroys every sense of purpose, and the only goal is clicks and time listened (not trasncendence, recognition, share of emotions or histories), so the "enjoyer" becomes more like a user or consumer than an human enjoying art
The problem is that it is making Spotify money if they substitute real artists by generated music that they produced themselves. They will have to pay a smaller share of their money to actual artists.
Among other already stated reasons I take listening to human made music as an investment. If everything is AI the originality will disappear because the already struggling artists will be pushed out even more. Even if they move elsewhere then the platform I currently use, and am comfortable with, will become useless to me.
I can’t enjoy listening to music I know was made by AI. If you can, power to you.
Initially bad songs made with real thought often mature into favourites as you learn what the artist was going for.
If you skip every song because you don't immediately like it, then you never learn to refine you palette.
There is then indeed a real fear when a song comes up catered to you, that says nothing about the artist, but was generated to keep you listening. You're getting pidgeonholed.
Interesting. The debate about whether the artist matters in perceiving a piece of art is very old. You don’t seem to consider the possibility that the artist’s intent matters when listening to music. For me it absolutely does. As the AI has no intent (agency), the AI music is void of any value to me.
> Insofar as you're a music-enjoyer, listening to music, the only question should be _is it good_. It shouldn't matter if it was created by duck or slug.
What an awful take. Music is inherently a human act, there's been lots written about this, but the point, and especially for music with lyrics is story telling, emotion, connection, empathy. Things a duck or slug or large language model have not business mimicking.
Well, let's make a revoltingly fun analogy: say a hamburger restaurant opened in your city, that openly admits it puts (ethically acquired) human meat in some of its products. You don't have to worry about the legality of the venture, it's all 100% compliant with the original persons donating their bodies to feed the world. Now, the hamburgers are extraordinarily good tasting, some say the best in town. The price is also good - they have a great hook up for the main ingredient, after all.
By the same logic, would you say that people refusing to eat there have "a disconnect between their culinary tastes and their values?" Or, if people have a visceral reaction to some other fast food joints surreptitiously introducing the same magic ingredient in their diet, would you also tell them to _just eat it_ and _fucking enjoy it_?
The source matters, both for meat and art. It's part of the product itself, you cannot disentangle the taste and sound of the performance from the way it was produced. AI art trying to pass as human art is simply a form fraud, and some people will always reject it, while others are of course free to embrace it and enjoy it.
> You don't have to worry about the legality of the venture, it's all 100% compliant with the original persons donating their bodies to feed the world. Now, the hamburgers are extraordinary good tasting, some say the best in town. The price is also good - they have a great hook up for the main ingredient, after all.
Halfway through this paragraph I started hearing it in the Trump cadence.
> The source matters, both for meat and art.
Yes, exactly. This is why people care about things like DOC, fair trade certifications, UFLPA clothing, cruelty-free cosmetics, and so on.
To deepen the analogy slightly: is the AI "ethically acquired"? Do the people collating the training data have consent for every piece of music they trained it on?
While I think it's an opinion that's somewhat valid and I wouldn't really blame anyone for consuming art this way, it's definitely missing a lot of what art can be about.
A piece of art is not a self-contained thing, the end result isn't where all (or even most) of the interest resides. The intent of the artist, the point they're making, the history that led to it, the references it makes and why, the choices and decisions taken in making it... that's all inherent part of the art and a huge part of why people might enjoy a piece of art or not.
For example, if I listen to some progressive rock, I might enjoy it for how a fellow human managed to identify and break some rules of traditional songwriting, for their expertise in musical theory, for the references they chose to make to other bands/songs/genres... If I learn it's AI-generated, the song itself hasn't changed but there's no point in it anymore, my enjoyment was directly coming from the fact that it was made by a human: if it's a machine I'll just shrug and say "yeah sure everyone knows machines can do that". Entire genres like punk or grunge make zero sense if not human-made.
For a more extreme example: a piece of contemporary art often has very little point in itself. The art is in the artist's process (their point of view, intent, history, etc), not the piece. If a piece is AI-generated, there's literally zero interest in it (except maybe as commentary on AI itself, fine).
> what you do is you _listen to it_ and you _fucking enjoy it_. This knee-jerk disgust reaction of "ugh I worry that it's AI" has no place in your heart in that moment
I suggest being a lot more humble about your understanding of art and other people's relationship to it
> a piece of contemporary art often has very little point in itself. The art is in the artist's process (their point of view, intent, history, etc), not the piece.
I personally find no enjoyment in art where I have to have context for it to be "interesting". Either it is or it isn't interesting on it's own merits. I find all art the same though. If it isn't interesting on it's own, then it's not interesting on the whole (for me).
^ This. I get that We Are On The Internet And People Will Be Wrong Sometimes -- but I'm really confused by the amount of people insisting that a subscription is just a slosh bucket of token capacity to be used however they feel like using it; are these people who genuinely misunderstand how subscriptions work or what Anthropic's terms were, and genuinely weren't aware that 3rd-party harnesses violate them? The vibe I get is more "how dare you constrain me from doing whatever I want", angry rebellious teenager vibe, willful oversimplifications of the situation... it doesn't feel particularly honest or reality-seeking.
I don't get it - in what way is this bait-and-switch? Anthropic's terms have made it amply clear that your claude subscription can only be used with Anthropic-provided tools, not with 3rd-party harnesses. I imagine anyone who uses OpenClaw is AI-savvy enough to be aware of that, and happily flouted those terms anyway. If anything Anthropic seems overly accommodating here by giving all flouters a month of free credit, rather than simply saying "sorry yall but we're gonna start enforcing that thing our TOS has said from the start".
The premise of the subscription isn't "giant bucket of ultra-cheap tokens that you can use however you want", it's "giant bucket of ultra-cheap tokens that you can use with OUR tools, within reasonable limits". Even if their TOS didn't prohibit OpenClaw-oids, I wouldn't consider this bait-and-switch, I'd consider it a reasonable and needed move.
There ToS didn’t use to say that. It does now, that is the bait and switch. FYI, openai says their sub IS a giant bucket of tokens you can use however you want.
I don't think the final evaluation is to "cement the understanding" so much as _verify_ that students have taken accountability for their own learning process.
This is what a student, who truly wants to learn rather than simply complete a course / certification, would do... Use AI tools to explain + learn, but not outsource the learning process itself to the tools.
Jeez this seems totally backwards to me. I'd rather live in a society where court records are as open and public as safely possible (like GP's vision) and we as a society adjust our norms such that it's assholish and discriminatory to pass over someone for hiring just because they shoplifted when they were 15.
There will for sure be major backlash against "permanent criminal" datasets (bringing up AI in this is a red herring, it's not fundamentally different from if someone were serving such a database using CGI scripts; AI just gives us more reach to do the things we were already committed to doing). But I frankly don't sympathize with the attitude that people should have the right to pretend that past decisions never happened. You also shouldn't be permanently _punished_ or _ostracized_ for your past self's decisions. But nor should you have the right to expect total anonymity / clean slate disconnected from your past self's decisions.
My probably unpopular view: The right direction is for us as a society to recognize and acknowledge that people change and _need to be allowed to change_ -- not take the easy hack of erasing history. The cost for larger-scale public transparency & institutional change efforts is just too high.
> The agent has no "identity". There is no "I". It has no agency.
"It's just predicting tokens, silly." I keep seeing this argument that AIs are just "simulating" this or that, and therefore it doesn't matter because it's not real. It's not real thinking, it's not a real social network, AIs are just predicting the next token, silly.
"Simulating" is a meaningful distinction exactly when the interior is shallower than the exterior suggests — like the video game NPC who appears to react appropriately to your choices, but is actually just playing back a pre-scripted dialogue tree. Scratch the surface and there's nothing there. That's a simulation in the dismissive sense.
But this rigid dismissal is pointless reality-denial when lobsters are "simulating" submitting a PR, "simulating" indignance, and "simulating" writing an angry confrontative blog post". Yes, acknowledged, those actions originated from 'just' silicon following a prediction algorithm, in the same way that human perception and reasoning are 'just' a continual reconciliation of top-down predictions based on past data and bottom-up sensemaking based on current data.
Obviously AI agents aren't human. But your attempt to deride the impulse to anthropormophize these new entities is misleading, and it detracts from our collective ability to understand these emergent new phenomena on their own terms.
When you say "there's no ghost, just an empty shell" -- well -- how well do you understand _human_ consciousness? What's the authoritative, well-evidenced scientific consensus on the preconditions for the arisal of sentience, or a sense of identity?
> Yes, acknowledged, those actions originated from 'just' silicon following a prediction algorithm, in the same way that human perception and reasoning are 'just' a continual reconciliation of top-down predictions based on past data and bottom-up sensemaking based on current data.
I keep seeing this argument, but it really seems like a completely false equivalence. Just because a sufficiently powerful simulation would be expected to be indistinguishable from reality doesn't imply that there's any reason to take seriously the idea that we're dealing with something "sufficiently powerful".
Human brains do things like language and reasoning on top of a giant ball of evolutionary mud - as such they do it inefficiently, and with a whole bunch of other stuff going on in the background. LLMs work along entirely different principles, working through statistically efficient summaries of a large corpus of language itself - there's little reason to posit that anything analogously experiential is going on.
If we were simulating brains and getting this kind of output, that would be a completely different kind of thing.
I also don't discount that other modes of "consiousness" are possible, it just seems like people are reasoning incorrectly backward from the apparent output of the systems we have now in ways that are logically insufficient for conclusions that seem implausible.
Unless you're being sarcastic, this is exactly the kind of surface-level false equivalence illogic I'm talking about. From my post:
> I also don't discount that other modes of "consciousness" are possible, it just seems like people are reasoning incorrectly backward from the apparent output of the systems we have now in ways that are logically insufficient for conclusions that seem implausible.
It's simulating, there's no real substance, except the "homonculus soul" that its human maker/owner injectet into it.
If you asked it to simulate a pirate, it would simulate a pirate instead, and simulate a parrot sitting on its shoulder.
This is hard to discuss because it's so abstract. But imagine an embodied agent (robot), that can simulate pain if you kick it. There's no pain internally. There's just a simulation of it (because some human instructed it such). It's also wrong to assign any moral value to kicking (or not kicking) it (except as "destruction of property owned by another human" same as if you kick a car).
Granted, I'm mostly working in small-to-medium codebases, 20k-30k LOC incl test suite. I wonder if that's a factor in my positive experience. Curious to hear your thoughts.
reply