Hacker Newsnew | past | comments | ask | show | jobs | submit | more hyperadvanced's commentslogin

I don’t think it’s “just” that easy. AI can be great at generating unit tests but it can and will also frequently silently hack said tests to make them pass rather than using them as good indicators of what the program is supposed to be doing.


> AI can be great at generating unit tests but it can and will also frequently silently hack said tests to make them pass rather than using them as good indicators of what the program is supposed to be doing.

Unit testing is my number one use case for gen AI in SWE. I just find the style / concept often slightly different than I would personally do, so I end up editing the whole thing.

But, it’s great at getting me past the unpleasant “activation energy threshold” of having a test written in the first place.


Totally. I’m a huge fan of it, but it rarely “just” works and I do have to babysit it to make sure it’s actually doing something good for the world


Am I stupid or do these agents regularly not read what’s in the agents.md file?


More recent models are better at reading and obeying constraints in AGENTS.md/CLAUDE.md.

GPT-5.2-Codex did a bad job of obeying my more detailed AGENTS.md files but GPT-5.3-Codex very evidently follows it well.


Perhaps I’m not using the latest and greatest in terms of models. I tend to avoid using tools that require excessive customization like this.

I find it infinitely frustrating to attempt to make these piece of shit “agents” do basic things like running the unit/integrations tests after making changes.


Opus 4.5 successfully ignored the first line of my CLAUDE.md file last week


Thank god it’s not just me. It really makes me feel insane reading some of the commentary online.


Each agent uses a different file, like claude.md etc (maybe you already knew that).

And it requires a bit of prompt engineering like using caps for some stuff (ALWAYS), etc.


You’re not stupid. But the agents.md file is just an md file at the end of the day.

We’ve been acting as if it’s assembly code that the agents execute without question or confusion, but it’s just some more text.


Presumably one of these high profile people will eventually get pwned if the risks really are that high.


This is my understanding as well. If GPT made money the companies that run them would be publicly traded?

Furthermore, companies which are publicly traded show that overall the products are not economical. Meta and MSFT are great examples of this, though they have recently seen opposite sides of investors appraising their results. Notably, OpenAI and MSFT are more closely linked than any other Mag7 companies with an AI startup.

https://www.forbes.com/sites/phoebeliu/2025/11/10/openai-spe...


Going public is not a trivial thing for a company to do. You may want to bring in additional facts to support your thesis.


Fwiw I’m not making a comment on companies which have transitioned from private to public or doing any sort of strategic analysis related to that. I’m merely saying if we index on publicly traded companies which have made substantial AI investments, the story isn’t super clear, i.e. as an investor there is not yet a solid positive thesis for the economics of LLM that would rate the tech as a buy. It doesn’t help that information is sparse here and that AI investments in general is clustered around a few megacap stocks with other interests outside of AI writ large


Going public also brings with it a lot of pesky reporting requirements and challenges. If it wasn't for the benefit of liquidity for shareholders, "nobody" would go public. If the bigger shareholders can get enough liquidity from private sales, or have a long enough time horizon, there's very little to be gained from going public.


Metacognition As A Service, you say?


Running on the Meta Cognition Protocol server near you.


You’ll get sued by Meta for this!


I think that’s called “consulting”.


This is a plain fact. The amount of obviously GPT’d text out there is so breathtaking. The good news is that all of this shit people are squeezing out of GPT is really reinvigorating my desire to work on novel creative projects.

In nearly every form of entertainment over the past 2 years, quality has degraded rapidly, to the point where most Reddit threads really aren’t worth reading - they’re all chock full of It’s Not X It’s Y! Most movies are heavily re-explained plots and reboots of old IP, CGI, dialogue glued on after the fact. Pop music has been on life support for 10 or 20 years, and that was before anyone could AI generate whatever derivative sonic slop they wanted to create. Books and publishers are holding strong to avoid GPT, but the desire for Dragon smut will ultimately overwhelm any aesthete’s preference for originality.


>In nearly every form of entertainment over the past 2 years

I mean, it really was on a huge downward trend before two years ago, and you hit on much of that.

Social media and the online advertising age had really destroyed a lot of entertainment well before AI was an issue. Bot-like humans would just copy and mish-mash existing media poorly and attempt to gather the ad dollars for themselves.

Honestly a lot of the enjoyable content I watch these days is from individuals/groups that aren't chasing ad views, but doing more 'donation begging' from their own audience which means they must maintain some level of quality for continued patronage.

One particular problem we seem to have is that we see AI as 'the problem' and not just one issue in a problematic system. Why doesn't Reddit do anything about crap posts and content? They make money off of convincing advertisers that lots of views happen on their site and they should sell ads. Why doesn't Google improve their services? They are a near monopoly in online advertising and service improvement for the end user doesn't increase their revenue.

It's just a system where one group is trying to extract wealth from Google/Facebook whereas Google/Facebook have nearly perfected extracting wealth from you.


Correct, this is a marketing document, not a government document or a legal agreement.


zsh is still crazy to me. I use bash for everything. I have no idea what I’m missing out on.


FWIW, I still use bash as well. Nothing against zsh per-se, it's just that I know bash, bash works, and there's no particular pain I experience using bash that will obviously be solved by switching. And when you factor in anticipated switching costs, I haven't found any compelling reason to spend any significant time on zsh so far.

Maybe one day though.


Yes, I'll throw my hat into this group too. Bash is fine.

YMMV but I have found using zsh too frictitious to be helpful. Sure, theoretically zsh living in a bash world (lets face it, all scripts are bash) is completely fine but reality seems to differ. Copied a one liner from shell history into your script? Crash. Use arrays? Weird bugs. Use shell builtins? Whoa unexpected interactivity!!! Etc...

Bash is absolutely fine as a default shell. As an added benefit, you don't feel like an invalid once logged in to a container or server.


I've been using it for the last 6 or 7 years and I can only remember one specific feature I use a lot: "unset HISTFILE" to disable history when I need to run commands with passwords.

Other than that, oh-my-zsh with git, systemd, and fzf plugins. Saves a lot of typing.

The main selling point for me is how easy it is to setup.


Space before the command will have the same effect


Just out of curiosity, what sort of typing do those plugins save in comparison to doing it in bash? Can you give some examples?


The git and systemd ones create several aliases for frequent commands:

git commit -> gc

git status --short -b -> gsb

git checkout -> gco

systemctl --user restart -> scu-restart

Nothing that you couldn't come up with yourself, but I've been using for so long it has become a standard for me.

The fzf plugin enables a fuzzy finder when you hit ctrl+r or ctrl+t. You need fzf installed.


I remember when I switched to zsh solely for SUNKEYBOARDHACK.


Prefixing a space on that command will keep it out of history.


I don't think it is crazy, but I know and love the bash quirks. I've got permanent history setup thanks to Eli Bandersky [1] which I know zsh has a solution to already. But what annoys me with zsh is some of the ways it tab completes when navigating the filesystem, and not by default allowing comments on the command line, e.g. '# github api key blahblahblah', which I can then pull later using phgrep.

Slight pain on a mac to get the latest version and use it as terminal shell, but it gets easier everytime I work on a fresh mac.

[1]: https://eli.thegreenplace.net/2013/06/11/keeping-persistent-...


If you want vi history editing like you are used to in bash for the last 20 years it's subtly different in a manner that makes it insanity inducing. If you use the traditional emacs like editing it's much the same.


Oh that’s good. I’m an emacs guy. I don’t like change.


I’m a hobbyist math guy (with a math degree) and LLMs can at least talk a little talk or entertain random attempts at proofs I make. In general they rebuke my more wild attempts, and will lead me to well-trodden answers for solved problems. I generally enjoy (as a hobby) finding fun or surprising solutions to basic problems more than solving novel maths, so LLMs are fun for me.


I’m strictly talking about “Agentic” coding here:

They are not a silver bullet or truly “you don’t need to know how to code anymore” tools. I’ve done a ton of work with Claude code this year. I’ve gone from a “maybe one ticket a week” tier React developer to someone who’s shipped entire new frontend feature sets, while also managing a team. I’ve used LLM to prototype these features rapidly and tear down the barrier to entry on a lot of simple problems that are historically too big to be a single-dev item, and clear out the backlog of “nice to haves” that compete with the real meat and bread of my business. This prototyping and “good enough” development has been massively impactful in my small org, where the hard problems come from complex interactions between distributed systems, monitoring across services, and lots of low-level machine traffic. LLM’s let me solve easy problems and spend my most productive hours working with people to break down the hard problems into easy problems that I can solve later or pass off to someone on my team to help.

I’ve also used LLM to get into other people’s codebases, refactor ancient tech debt, shore up test suites from years ago that are filled with garbage and copy/paste. On testing alone, LLM are super valuable for throwing edge cases at your code and seeing what you assumed vs. what an entropy machine would throw at it.

LLM absolutely are not a 10x improvement in productivity on their own. They 100% cannot solve some problems in a sensible, tractable way, and they frequently do stupid things that waste time and would ruin a poor developer’s attempts at software engineering. However, they absolutely also lower the barrier to entry and dethrone “pure single tech” (ie backend only, frontend only, “I don’t know Kubernetes”, or other limited scope) software engineers who’ve previously benefited from super specialized knowledge guarding their place in the business.

Software as a discipline has shifted so far from “build functional, safe systems that solve problems” to “I make 200k bike shedding JIRA tickets that require an army of product people to come up with and manage” that LLM can be valuable if only for their capabilities to role-compress and give people with a sense of ownership the tools they need to operate like a whole team would 10 years ago.


> However, they absolutely also lower the barrier to entry and dethrone “pure single tech” (ie backend only, frontend only, “I don’t know Kubernetes”, or other limited scope) software engineers who’ve previously benefited from super specialized knowledge guarding their place in the business.

This argument gets repeated frequently, but to me it seems to be missing final, actionable conclusion.

If one "doesn't know Kubernetes", what exactly are they supposed to do now, having LLM at hand, in a professional setting? They still "can't" asses the quality of the output, after all. They can't just ask the model, as they can't know if the answer is not misleading.

Assuming we are not expecting people to operate with implicit delegation of responsibility to the LLM (something that is ultimately not possible anyway - taking blame is a privilege human will keep for a foreseeable future), I guess the argument in the form as above collapses to "it's easier to learn new things now"?

But this does not eliminate (or reduce) a need for specialization of knowledge on the employee side, and there is only so much you can specialize in.

The bottleneck maybe shifted right somewhat (from time/effort of the learning stage to the cognition and the memory limits of an individual), but the output on the other side of the funnel (of learn->understand->operate->take-responsibility-for) didn't necessary widen that much, one could argue.


> If one "doesn't know Kubernetes", what exactly are they supposed to do now, having LLM at hand, in a professional setting? They still "can't" asses the quality of the output, after all. They can't just ask the model, as they can't know if the answer is not misleading.

This is the fundamental problem that all these cowboy devs do not even consider. They talk about churning out huge amounts of code as if it was an intrinsically good thing. Reminds me of those awful VB6 desktop apps people kept churning out. Vb6 sure made tons of people nx productive but it also led to loads of legacy systems that no one wanted to touch because they were built by people who didn't know what they were doing. LLMs-for-Code are another tool under the same category.


>They still "can't" asses the quality of the output, after all. They can't just ask the model, as they can't know if the answer is not misleading.

Wasn't this a problem before AI? If I took a book or online tutorial and followed it, could I be sure it was teaching me the right thing? I would need to make sure I understood it, that it made sense, that it worked when I changed things around, and would need to combine multiple sources. That still needs to be done. You can ask the model, and you'll have the judge the answer, same as if you asked another human. You have to make sure you are in a realm where you are learning, but aren't so far out that you can easily be misled. You do need to test out explanations and seek multiple sources, of which AI is only one.

An AI can hallucinate and just make things up, but the chance it different sessions with different AIs lead to the same hallucinations that consistently build upon each other is unlikely enough to not be worth worrying about.


I don’t think the conclusion is right. Your org might still require enough React knowledge to keep you gainfully employed as a pure React dev but if all you did was changing some forms, this is now something pretty much anyone can do. The value of good FE architecture increased if anything since you will be adding code quicker. Making sure the LLM doesn’t stupidly couple stuff together is quite important for long term success


If you don’t know k8s, or any tech really, you can RTFM, you can generate or apply some premade manifests, you can feed the errors into the LLM and ask about it, you can google the error message, you can do a lot of things. Often times, in the “real world” of software engineering, you learn by having zero idea of how to do something to start with and gradually come up with ideas from screwing around with a particular tool or prototyping a solution and seeing how well it works.

I agree that some of the above basically amounts to: it’s easier to learn new things. Which itself might sound ho-hum, but it really is a fundamental responsibility of software engineers to learn new things, understand new and complex problems, and learn how to do it correctly and repeatable. LLMs unquestionably help with this, even with their tendency to hallucinate: usually proof by contradiction (or the failure of an over-confident chaos machine) is even better than just having a thing that spits out perfect solutions without needing the operator to understand it.

However, I will say that there is a very large gulf between learning how to reason about complex systems or code and learning how to use the entropy machine to produce nominally acceptable work. Pure reliance and delegation of responsibility to the AI will torpedo a lot of projects that a good engineer could solve, and no amount of lines of code makes up for a poorly conceived product or a brittle implementation that the LLM later stumbles over. Good engineering principles are more important than ever, and the developer has to force the LLM to conform to those.

There are many things to question about agentic coding: whether it’s truly cost/effort effective, whether it saves time, whether it makes you worse at problem solving by handing you facile half-solutions that wither in the face of the chaos of the real world, etc. But they clearly aren’t a technology which “doesn’t do ANYTHING useful”, as some HN posters claim.


It really depends on whether coding agents is closer to "compiler" or not. Very few amongst us verify assembly code. If the program runs and does the thing, we just assume it did the right thing.


> someone who’s shipped entire new frontend feature sets, while also managing a team. I’ve used LLM to prototype these features rapidly and tear down the barrier to entry on a lot of simple problems that are historically too big to be a single-dev item, and clear out the backlog of “nice to haves” that compete with the real meat and bread of my business. This prototyping and “good enough” development has been massively impactful in my small org

Has any senior React dev code review your work? I would be very interested to see what do they have to say about the quality of your code. It's a bit like using LLMs to medically self diagnose yourself and claiming it works because you are healthy.

Ironically enough, it does seem that the only workforce AIs will be shrinking will be devs themselves. I guess in 2025, everyone can finally code


That's a solid answer, I like it, thanks!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: