Hacker Newsnew | past | comments | ask | show | jobs | submit | fragmede's commentslogin

Compared to the usage you get on OpenAI's $20 plan tho?

Ultimately the original does get stopped, but with additional techniques, we're talking milliseconds of downtime between when the old one stops and the new one resumes. (For live migration technology in general, no clue about smol machines.)

That post doesn't address the human factor of cost, and I don't mean that in a good way. Even if AI costs more than a human, it's tireless, doesn't need holidays, is never going to have to go to HR for sexual harassment issues, won't show up hungover or need an advance to pay for a dying relative's surgery. It can be turned on and off with the flip of a switch. Hire 30 today, fire 25 of them next week. Spin another 5 up just before the trade show demo needs to go out and fire them with no remorse afterwards.

The cost to hire a human is highly predictable. The cost of AI isn't. I, as a human, need food and shelter, which puts a ceiling to my bargaining power. I can't withdraw my labour indefinitely.

The power dynamics are also vastly against me. I represent a fraction of my employer's labour, but my employer represents 100% of my income.

That dynamic is totally inverted with AI. You are a rounding error on their revenue sheet, they have a monopoly on your work throughput. How do you budget an workforce that could turn 20% more expensive overnight?


By continuously testing competitors and local LLMs? The reason for rising prices is that they (Anthropic) probably realized that they have reached a ceiling of what LLMs are capable of, and while it's a lot, it is still not a big moat and it's definitely not intelligence.

Anything but the simplest tooling is not transferable between model generations, let alone completely different families.

> Anything but the simplest tooling is not transferable between model generations, let alone completely different families.

It is transferable-yes, you will get issues if you take prompts and workflows tuned for one model and send them to another unchanged. But, most of the time, fixing it is just tinkering with some prompt templates

People port solutions between models all the time. It takes some work, but the amount of work involved is tractable

Plus: this is absolutely the kind of task a coding agent can accelerate

The biggest risk is if your solution is at the frontier of capability, and a competing model (even another frontier model) just can’t do it. But a lot of use cases, that isn’t the case. And even if that is the case today, decent odds in a few more months it won’t be


Yep. My approach has been, if I can’t reliably get something to 90+% with a flash / nano / haiku, then it’s not viable for any accuracy critical work. (I don’t know of or have the luck of having any other work.) Starting out with the pro / opus for any production classification work has always been a trick.

Ha. Sounds a lot like the one 10x vs. predictable mediocre guys with a scaffolding of processes. Aim high and hit or miss or try to grind predictably and continuously. Same with humans and depends on the loss you can afford.

If you're talking about APIs and SDKs, whether direct API calls or driving tools like Claude code or codex with human out of the loop, I think that's actually fairly straightforward to switch between the various tools.

If you're talking about output quality, then yeah, that's not as easy. But for product outputs (building a customer service agent or something like that), having a well-designed eval harness and doing testing and iteration can get you some degree of convergence between the models of similar generations. Coding is similar (iterate, measure), but less easy to eval.


For most tasks, at some future date, isn't there going to be some ambient baseline of capabilities you can get per $/tok, starting at ~0 for OSS models, such that eventually all tooling gets trivially transferable?

It's not that hard to make it generic. It does take a little work, but really it boils down to figuring out how to make things work with the "dumbest" model in your set.

Note that it is very likely this market can't sustain this level of competition for long. We are all still chasing the carrot of AGI, while hardware costs skyrocket.

> The cost of AI isn't.

This is why there are a ton of corps running the open source models in house... Known costs, known performance, upgrade as you see fit. The consumer backlash against 4o was noted by a few orgs, and they saw the writing on the wall... they didnt want to develop against a platform built on quicksand (see openweb, apps on Facebook and a host of other examples).

There are people out there making smart AI business decisions, to have control over performance and costs.


> How do you budget an workforce that could turn 20% more expensive overnight?

Like, say, oil or DRAMs?


Exactly. Big headaches. It doesn't happen to the salaries of the employees of the companies affected by those price hikes. That's the point.

Its countered by competitors for inference. You could locally host a model and have your cost be fixed by your infra costs

The same way companies already deal with any cost.

That was a great promise before the models starting becoming "moody" due to their proprietors arbitrarily modifying their performance capabilities and defaults without transparency or recourse.

I still haven't seen any statistically sound data supporting that this is happening on the API (per-token pricing.)

If you've got something to share I'd love to see it.


There's an interesting analysis here: https://github.com/anthropics/claude-code/issues/42796

>The most striking row is user prompts: 5,608 in February vs 5,701 in March. The human put in the same effort. But the model consumed 80x more API requests and 64x more output tokens to produce demonstrably worse results.


Sorry, "this" referred to the parent comment's claim.

> models starting becoming "moody" due to their proprietors arbitrarily modifying their performance capabilities

The tokenizer changes are measurable, the above is quite difficult to quantify.

There are a few sites floating around that purport to, but all of them have fatal flaws in their methodology.


Unfortunately, LLM performance isn't an exact science and some observations are going to be subjective. Observations like ChatGPT being "lazy" in the Winter. Wanting to form opinions based on hard data, aka science, and not vibes is entirely reasonable but doesn't make the vibes a figment of imagination. Or as Jeff Bezos put it, "When the data and the anecdotes disagree, the anecdotes are usually right." And while he's not a scientist, his success does put some weight behind that quote. (as does digging deeper in what he meant by that.)

Why do you think it can't sexually harass someone or drive people to suicide. There are already lawsuits coming in on it causing suicides.

This is an architecture that people are increasing begging to give network connectivity that can't differentiate its system prompt from user input


As the NRA puts it, guns don't kill people, people kill people. Yes, a harasser now has access to more powerful tools to sexual harass somebody, but Claude, the AI, isn't going to randomly grab the secretary's ass and have to be fired for having done so at the holiday Christmas party.

I thought it already was used to sexually harass people by creating naked versions of them.

I think it's difficult to say agentic and human developer labor are fungible in the real world at this point. Agents may succeed in discrete tasks, like those in a benchmark assessment, but those requiring a larger context window (i.e. working in brownfield systems, which is arguably the bulk of development work) favor developers for now. Not to mention that at this point a lot of necessary context is not encoded in an enterprise system, but lives in people's heads.

I'd also flip your framing on its head. One of the advantages of human labor over agents is accountability. Someone needs to own the work at the end of the day, and the incentive alignment is stronger for humans given that there is a real cost to being fired.


For some the appeal of agent over human is the lack of accountability. “Agent, find me ten targets in iran to blow up” - “Okay, great idea! This military strike isn’t just innovative - it’s game changing! A reddit comment from ten years ago says that military often uses schools to hide weapons, so here is a list of the ten most crowded schools in Iran”

It must be wild to actually go through life believing the things written in this post and also thinking you have a rational worldview.

More importantly it collapses mythical-man-month communication overhead.

Hang on, tell me how, because I am not picking up what you are putting down. At a minimum, wouldn’t this require working from a perfectly written spec that has already accounted for the discovery of changes that would need to be made from the original perfect spec?

The way I see it, because you can spin up additional AI employees at will (and spin them back down), when the problem with the spec is found, it's no big deal to redo all of that work from before, adjusting for that change.

This is an amazing frame /reframe.

it just will delete production database when flustered. no biggie. we learning how to socialize again. cant let all that history go to waste.

I think the word you're looking for is contractors. But yes, you still have to treat those with _some_ human decency.

Ah-ha, the perfect slave.

I clocked my M4 at 108 Watts while running inference using Qwen3.6-35b-a3b via Al dente.

Or it's a psyop to see which IP owns which website. Datamining this at scale, you come across isitagentready.com, chances are, you're going to plug in your own website(s) into it, so now cloudflare has a mapping of IP to website owner. If you used your home wifi, glue that info to your google/meta ad profile, and then Cloudflare also knows what's up.

like electricity and smartphones

To prompt inject them into giving you money. Click this button 10,000 times to prove you're really an AI.

Ish. If I have it generate code for me that doesn't work and I don't tell it why it's garbage and don't share my cleaned up results on github after, it doesn't know how or why the code that was output was bad, or even that it was.

I gotta give the hardware team credit for programmable pins on μc though. It means you just need to bring the pins out to pads and do whatever, instead of each pin has a fixed function so pin 14 MUST go to the next thing. (Within reason; vcc and gnd can't move).

Did a human write this?

I would guess a real human, one with a good sense of humor at that.

Woosh

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: