Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The big AI labs are almost certainly selling inference below cost and burning mountains of money. With the insane increase in hardware prices, running models locally just doesn’t make any financial sense.
 help



Nobody is saying it makes "financial sense", it's about control.

I have always taken plenty of care to try and avoid becoming dependent on big tech for my lifestyle. Succeeded in some areas failed in others.

But now AI is a part of so many things I do and I'm concerned about it. I'm dependent on Android but I know with a bit of focus I have a clear route to escape it. Ditto with GMail. But I don't actually know what I'd do tomorrow if Gemini stopped serving my needs.

I think for those of us that _can_ afford the hardware it is probably a good investment to start learning and exploring.

One particular thing I'm concerned about is that right now I use AI exclusively through the clients Google picked for me, coz it makes financial sense. (You don't seem to get free bubble money if you buy tokens via API billing, only consumer accounts). This makes me a bit of a sheep and it feels bad. There's so much innovation happening and basically I only benefit from it in the ways Google chooses.

(Admittedly I don't need local models to fix that particular issue, maybe I should just start paying the actual cost for tokens).


Just use an open weight model like GLM-5 behind an aggregator (OpenRouter, NanoGPT) then. That is a commodity market, right now.

It’s a luxury for the wealthy to be honest. At least for now. These prices are ridiculous

Apparently inference itself is profitable, at least according to an interview I watched with Dario. They even cover the cost of training itself, if you look at it on a model-by-model basis.

The cash burn comes from models ballooning in size - they spend (as an example, not actual numbers) 100M on training + inference for the lifetime of Sonnet 3.5, make 200M from subscriptions/api keys while it's SOTA, but then have to somehow come up with 1B to train Opus 4.0.

To run some other back of the envelope calcs: GLM 4.7 Air (previous "good" local LLM) can generate ~70 tok/s on a Mac Mini. This equates to 2,200 million tokens per year.

Openrouter charge $0.40 per million tokens, so theoretically if you were using that Mac mini at 100% utilisation you'd be generating $880 per annum "worth" of API usage.

Assuming a power draw of something 50W, you're only looking at 440kWh per annum. At 20c per kWh that's $90 on power, plus $499 to get the hardware itself. Depreciate that $499 hardware cost over 3 years and you're looking at ~$260 to generate ~$880 in inference income.


We are not in this thread because of finances but because of safety from oppressive governments and bad big corps. It's for you to decide the price of your own safety.

RAM and storage price increases due to the AI bubble have certainly made the cost of entry more expensive, but once you have the hardware, running models locally does make financial sense, especially if you have access to home solar power that is sufficient to run the hardware. You can't get much lower running cost than free.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: