Hacker Newsnew | past | comments | ask | show | jobs | submit | onlyrealcuzzo's commentslogin

And here I am trying to get an LLM to add types to a 100k line Ruby repository for 2 days, and it's not going so hot...

I have some experience in this. Reach out (email in my bio) I would love to chat.

A SMT solver may work better.

Will that work if my codebase is filled with nils it shouldn't be filled with, and HashMaps instead of structs with a loosely defined schema, and tuples masquerading as arrays?

"He" writes tens of thousands of lines per week.

An LLM wrote a flight simulator in a language an LLM also wrote for you.

It's cool that you're doing all of this, and hopefully you and others get value out of it.

But it helps to be clear about who is doing what.

Just own it.

It's cool that AI can do this.


What's the reasoning that software companies don't have to count R&D into gross margins?

> They can become profitable anytime they want.

By cutting 1100 workers!


I find Gemini to be quite good / acceptable at code review, design, and design review, but it's notably far behind Claude Code for implementation.

Are you having better results?

Codex is fast and decent, but I REALLY have to stay on top of it. The amount of times it makes executive design decisions on the fly to completely break everything is way too high.


I've used it with fairly wide open prompts and also detailed markdown specs and it has no problem making them perfectly, but good code quality requires a bit of follow up work.

I either vibe code a whole personal project, or strongly direct it to generate individual changes. It's fine for both.

The Pro model is the only good model for complex code and I think it's slower than Claude and Codex.


We've been automating stuff for 60 years, and it only leads to more automation.

At the end of the day, the more automation, the more people you need making sure things work.

There's always going to be a minimal bottleneck for how much an engineer can oversee if they need to do zero implementation.

We're not as far from that point as people think.

Most languages most things are developed in are 10x more expensive than languages of yore.

Rust has a bad reputation for being hard, but it is actually quite expressive.

Less than 50% of what engineers do is code.

IBM was famous, in the early 2000s, for the average dev writing one line of code per day on average.

We're just going to move to a world where the average dev spends <10% of their time coding, but there's likely to be x times more work, so it mostly evens out.


I think you're saying the same thing OP said.

The point is, their default behavior is to ship crap fast.

You have a process to handle that.

So does OP.


> The problem is that people are now building our world around tooling that eschews accountability.

Management has doing a wonderful job of eschewing accountability for decades.

It's a lot of people's dream to be able to say, yeah, our product doesn't work, but it's not OUR fault, and the client just shrug and grumble ai ai ai, and just put up with it because they know they can't get a better service anywhere else.

It's not MY fault my website is down: it's Amazon's! It's not MY fault my app doesn't work: it's Claude Code's!


Well just to be clear from a legal perspective, in the case of AI, as long as AI is "property", the owners, developers, and/or users will be held liable for things like the hypothetical fatal car accident that Sussman posits.

Currently, from a legal perspective, AI is considered a "tool" without legal persona. So you sue the developer, the owner, or the user of the AI. (Just kidding, any lawyer worth his/her salt will sue all three! But you get the point.)

Legally speaking, AI will probably be viewed that way for a long time. There are too many issues agitating against viewing it any other way. Owners will not give up property rights. No will to overbear. On and on and on.


This doesn't seem to be how it works in practice. "AI" or not, complex systems are a pretty good shield from accountability in practice today.

PSA. Do not listen to advice like this.

>complex systems are a pretty good shield from accountability in practice today.

Maybe complex legal systems are, but complex software systems offer you no such protection.

My field for the past few decades has been diagnostic medical software. In that field, the 501K you got is kind of entering you into an ironclad agreement with the government. There's almost no way out of it. 501K certs significantly simplify, (for the government), holding you accountable. You have made attestations to suitability directly to the federal government. And the way our chief counsel explained it to us, literally each signature you sent to the government, for each feature that failed, is actually a single count of lying to the federal government.

Please, please, please people, don't listen to comments like the one above. Everything should be run by your qualified legal expert. Getting things right up front is so much easier than trying to fix things when the inevitable happens.

Alternatively, stick to fields free from regulation. That's also a viable strategy. But to just trust that the legal system is complicated and the technology you're deploying is complicated, so the feds will never get me? That's the start of a lot of really bad stories.


> I regularly ship four features at a time now across multiple projects.

Many people are missing the fact that LLMs allow ICs to start operating like managers.

You can manage 4 streams now. Within a couple years, you may be able to manage 10 streams like a typical manager does today.

IME, LLMs don't speed you up that much if 1) you're already an expert at what you're doing (inherently not scalable), 2) you're only working on one thing (doesn't make sense when you can manage multiple streams), or 3) doing something LLMs are particularly bad it (not many remaining coding tasks, but definitely still some).


A manager doesn't have to look at the code that's being shipped. An IC will still need to do that, and this will eventually take up much of their work. It can be addressed by moving up the stack to higher level and more strictly checked languages, where there's overall less stuff to review manually.

People typically think it's not a new person's fault if they come in to a team and bring down production.

That's a failure of the existing infrastructure to allow someone to do this.

LLM coding will work like this.

If you're letting LLMs go wild with no system in place to automatically know they're moving in the right direction and "shipping" things up to your standards, the failure is you, not the LLM.


The dirty secret is all the people talking about shipping 4 features a day etc are just lying about reviewing anything. They don’t review it at all.

I didn't say shipping a day. I said shipping at the same time.

The review comes at the end, though I truly believe this will go away as well. Agents will also get better at review until they're good enough that no one will want to do it anyways. Good enough is good enough.


I review more thoroughly and faster with Claude than without.

Claude absolutely improves code review quality, but it still misses a lot. It's a second pair of eyes, it doesn't replace/remove the work you have to put in to fully review the code yourself.

It's like saying that you code reviewed faster just because someone else also reviewed the code, that's not how it works.


Agree, and with CC my volume and quality of PR review has substantially increased since 4.5. Without CC for review we would have a ridiculous bottleneck in our dev/qa pipeline.

I'm faster, sure, but more thorough, no. The same, because I was already very careful. But it's not a massive win either; 4.7 misses too much still because it would need to read too much of the context each time to understand the architectural problems I'm catching.

Its nice to not have to care about nits and other things that we don't have lints for though, so that's useful.


Spot on. When will the cretins understand, it's not about how much code you can generate.

Just like a manager, you don't need to look at the code. You need to set up quality systems to provide evidence the code does what it is supposed to do, just like a manager.

I’ve never met a manager that have setup “quality” systems to ensure that the job is done correctly. Their actions are always retroactive. And not pertaining to code at all. The overarching contract is “You do a bad job, you will be fired”.

Code review has a number of important purposes beyond merely verifying functionality. It's true that some managers don't recognize this, fail to allocate time for anything but feature work, and then wonder a few years later why the software is so buggy and new feature development is so hard.

A software engineer was always a manager.

Software engineers were always creating, maintaining and updating automated business processes. In olden days we would have computers, that is rows of people computing things. That room of people is replaced with code in von Neumann machines.

The economic tension has always been a resistance to grant programmers status and class of management. Instead management wants to treat programmers like labor.


> It obviously went through lots of files in both prompts but total cost? Just $0.09 for the Pro version.

When people say that LLMs aren't worth it, it kills me.

A lot of us, on average, make $100+ an hour. $0.09 is < 4 seconds of our time.

You can't even read the vast majority of prompt responses that fast.

LLMs will continue to get better (I'm doubtful at previous rates, all indications are showing that progress is slowing and costs are increasing disproportionately).

It seems like >50% of devs think LLMs provide less than 0 value. I just do not get it.

Did they use an LLM one time 3 years ago and decide it's never going to be worth it? Have they even tried? Or have you only ever tried it on 1 giant, monolythic proprietary codebase where they're a total expert and decided that an LLM isn't as good as them, so it's "completely worthless"?

They are shockingly unhelpful on my company's codebase.

But that doesn't mean they are flat-out worthless.


I know I'm guilty of making this sort of argument sometimes, but it's just not valid.

I don't get paid for every waking hour of every day. Often I'm using an LLM for something that's uncompensated, so my hourly wage equivalent is irrelevant.

And for times when we might use an LLM for something related to paid work, it's still money out of your paycheck (unless the employer is paying for it; go nuts in that case). And it's not like using the LLM lets you go home early if it saves you time. You just end up doing more work.

I still use them because they're a useful tool sometimes. But I don't pretend it has negligible or no cost. (Not to mention the externalities around electricity use, crazy data center buildout, skyrocketing GPU and RAM prices, etc.)


I don't understand, your employer doesn't pay for your AI use? If my employer didn't pay for it I just wouldn't use it at all out of principle. Just as I don't buy my own work laptop

> You just end up doing more work.

Might want to dig into that one a bit deeper there.


Biggest issue with Opus for me is not so much that it's expensive (though it is), but the fact it's slow especially during US working hours.

I prefer using slightly worse but significantly quicker models on a tighter leash and iterating faster, feels more productive


100+ on average?! That hurt.

Very American centered POV

Second -> this seems like something that might be cool.

But as someone who's probably as close to your target audience as you can get -> it's not clear to me what exactly this does, and when I would need it.

That may mean I'm not in your target audience, but then I suspect that audience is very small.

My critique:

> Pollen is a self-organising mesh and WASM runtime written in pure Go. Workloads are "seeded" into the cluster and organically scale and follow load. There is no central coordinator; decisions are made deterministically, locally, using a gossiped CRDT runtime state as their source of truth. Same view of the world; same workload placement and routing.

Sentence one is fine. It could probably be less mumbo-jumbo-y.

Sentence 2 should be paragraph 2.

Your actual sentence 2 should be along the lines of: what is a self-organizing mesh, and when is it useful (IMO).

I also would suggest not using CRDT right away. I think you might have a lot of people that might be interested in this, but don't know exactly what that is or why it's useful.


> but don't know exactly what that is or why it's useful.

I hate to say it, but the only applications I can think of can be best categorized as illegitimate, likely clandestine, distributed computing tasks.


I'm unaware of a better solution for local-first / offline-first software syncing (which IMO should be more common)!

They're also great for collaborative text editing or if you're building a distributed database (not many people, but I'm in an adjacent field).

At MASSIVE scale (inherently not many people), they're also good for things people take for granted, like counting (and other things people don't take for granted).

Again, it's not clear to me exactly where and why Pollen helps in any of these scenarios.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: