Hacker Newsnew | past | comments | ask | show | jobs | submit | zargon's commentslogin

> I have much greater confidence on eBay that I'm not going to get something counterfeit or unsafe.

On eBay, sellers control their inventory, listings, customer service, and resulting reputation end-to-end. On Amazon, the incentives are backwards.


LLM slop.

I want to try Zed, but it's just too much of a supply chain attack waiting to happen. https://github.com/zed-industries/zed/issues/12589

I did install it in a VM with virtio-gpu, but it was absurdly slow, so I wasn't able to try it.


Had similar concerns, but just noticed they seem to be taking this more seriously now: https://zed.dev/blog/secure-by-default

Seems not applicable, given it will still download and install random LSPs for you without asking.

Is that directly related to the GitHub issue? Or you just mean that they're taking security more seriously?

I was searching the article you linked to see if it addresses the GH issue in any way, but it seems to not.


Also from phoronix, a comparison with AMD R9700 and RTX 6000 Ada (because Nvidia has not sent them a blackwell card): https://www.phoronix.com/review/intel-arc-pro-b70/2

It's only in preview right now. And anyway, yes, models regularly get updated training.

But in this case, it's more likely just to be a tooling issue.


I think you mean ollama vs llama.cpp.

I do!

Damn autocorrect :)


I call it autocorrupt :)

Flash is less than 160 GB. No need to quantize to fit in 2x 96 GB. Not sure how much context fits in 30 GB, but it should be a good amount.

It seems to be 160GB at mixed FP4+FP8 precision, FYI. Full FP8 is 250GB+. (B)F16 at around double I would assume.

There is no BF16. There is no FP8 for the instruct model. The instruct model at full precision is 160 GB (mixed FP4 and FP8). The base model at full precision is 284 GB (FP8). Almost everyone is going to use instruct. But I do love to see base models released.

> ~100GB at 16 bit or ~50GB at 8bit quantized.

V4 is natively mixed FP4 and FP8, so significantly less than that. 50 GB max unquantized.


That article is a total hallucination.

"671B total / 37B active"

"Full precision (BF16)"

And they claim they ran this non-existent model on vLLM and SGLang over a month and a half ago.

It's clickbait keyword slop filled in with V3 specs. Most of the web is slop like this now. Sigh.


The Flash version is 284B A13B in mixed FP8 / FP4 and the full native precision weights total approximately 154 GB. KV cache is said to take 10% as much space as V3. This looks very accessible for people running "large" local models. It's a nice follow up to the Gemma 4 and Qwen3.5 small local models.

Price is appealing to me. I have been using gemini 3 flash mainly for chat. I may give it a try.

input: $0.14/$0.28 (whereas gemini $0.5/$3)

Does anyone know why output prices have such a big gap?


Output is what the compute is used for above all else; costs more hardware time basically than prompt processing (input) which is a lot faster

input tokens are processed at 10-50 times the speed of output tokens since you can process then in batches and not one at a time like output tokens

I'm going to blow my bandwidth allowance again this month, aren't I.

Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: