Bad code has real world consequences. Its not limited to having to rewrite it. The cost might also include sanctions, lost users, attrition, and other negative consequences you don’t just measure in dev hours
Right, but that cost is also incurred by human-written code that happens to have bugs.
In theory experienced humans introduce less bugs. That sounds reasonable and believable, but anyone who's ever been paid to write software knows that finding reliable humans is not an easy task unless you're at a large established company.
Well, if you keep in mind that "professionals" means "people paid to write code" then LLMs have been generating code at the same quality OR BETTER for about a year now. Most code sucks.
If you compare it to beautiful code written by true experts, then obviously not, but that kind of code isn't what makes the world go 'round.
We should qualify that kind of statement, as it’s valuable to define just what percentile of “professional developers” the quality falls into. It will likely never replace p90 developers for example, but it’s better than somewhere between there and p10. Arbitrary numbers for examples.
Can you quantify the quality of a p90 or p10 developer?
I would frame it differently. There are developers successfully shipping product X. Those developer are, on average, as skilled as necessary to work on project X. else they would have moved on or the project would have failed.
Can LLMs produce the same level of quality as project X developers? The only projects I know of where this is true are toy and hobby projects.
> Can you quantify the quality of a p90 or p10 developer?
Of course not, you have switched “quality” in this statement to modify the developer instead of their work. Regarding the work, each project, as you agree with me on from your reply, has an average quality for its code. Some developers bring that down on the whole, others bring it up. An LLM would have a place somewhere on that spectrum.
In a one-shot scenario, I agree. But LLMs make iteration much faster. So the comparison is not really between an AI and an experienced dev coding by hand, it's between the dev iterating with an LLM and the dev iterating by hand. And the former can produce high-quality code much faster than the latter.
The question is, what happens when you have a middling dev iterating with an LLM? And in that case, the drop in quality is probably non-linear---it can get pretty bad, pretty fast.
There was a recent study posted here that showed AI introduces regressions at an alarming rate, all but one above 50%, which indicates they spend a lot of time fixing their own mistakes. You've probably seen them doing this kind of thing, making one change that breaks another, going and adjusting that thing, not realizing that's making things worse.
The study is likely "SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration". Regression rate plot is figure 6.
Read the study to understand what it is measuring and how it was measured. As I understand parent's summary is fine, but you want to understand it first before repeating it to others.
Bentley Software is proof that you can ship products with massive, embarrassing defects and never lose a customer. I can’t explain enterprise software procurement, but I can guarantee you product quality is not part of that equation.