More

zargon · 2026-05-04T17:49:16 1777916956

> I have much greater confidence on eBay that I'm not going to get something counterfeit or unsafe.

On eBay, sellers control their inventory, listings, customer service, and resulting reputation end-to-end. On Amazon, the incentives are backwards.

zargon · 2026-05-03T17:27:23 1777829243

LLM slop.

zargon · 2026-04-29T18:26:02 1777487162

I want to try Zed, but it's just too much of a supply chain attack waiting to happen. https://github.com/zed-industries/zed/issues/12589

I did install it in a VM with virtio-gpu, but it was absurdly slow, so I wasn't able to try it.

skue · 2026-04-30T04:34:39 1777523679

Had similar concerns, but just noticed they seem to be taking this more seriously now: https://zed.dev/blog/secure-by-default

sevenseacat · 2026-04-30T13:21:31 1777555291

Seems not applicable, given it will still download and install random LSPs for you without asking.

yellow_lead · 2026-04-30T06:42:20 1777531340

Is that directly related to the GitHub issue? Or you just mean that they're taking security more seriously?

I was searching the article you linked to see if it addresses the GH issue in any way, but it seems to not.

zargon · 2026-04-28T21:43:36 1777412616

Also from phoronix, a comparison with AMD R9700 and RTX 6000 Ada (because Nvidia has not sent them a blackwell card): https://www.phoronix.com/review/intel-arc-pro-b70/2

zargon · 2026-04-24T06:26:36 1777011996

It's only in preview right now. And anyway, yes, models regularly get updated training.

But in this case, it's more likely just to be a tooling issue.

zargon · 2026-04-24T06:23:26 1777011806

I think you mean ollama vs llama.cpp.

throwa356262 · 2026-04-24T06:39:46 1777012786

I do!

Damn autocorrect :)

zargon · 2026-04-24T07:05:01 1777014301

I call it autocorrupt :)

zargon · 2026-04-24T05:31:05 1777008665

Flash is less than 160 GB. No need to quantize to fit in 2x 96 GB. Not sure how much context fits in 30 GB, but it should be a good amount.

redrove · 2026-04-24T05:48:29 1777009709

It seems to be 160GB at mixed FP4+FP8 precision, FYI. Full FP8 is 250GB+. (B)F16 at around double I would assume.

zargon · 2026-04-24T05:51:12 1777009872

There is no BF16. There is no FP8 for the instruct model. The instruct model at full precision is 160 GB (mixed FP4 and FP8). The base model at full precision is 284 GB (FP8). Almost everyone is going to use instruct. But I do love to see base models released.

zargon · 2026-04-24T05:23:47 1777008227

> ~100GB at 16 bit or ~50GB at 8bit quantized.

V4 is natively mixed FP4 and FP8, so significantly less than that. 50 GB max unquantized.

zargon · 2026-04-24T05:03:21 1777007001

That article is a total hallucination.

"671B total / 37B active"

"Full precision (BF16)"

And they claim they ran this non-existent model on vLLM and SGLang over a month and a half ago.

It's clickbait keyword slop filled in with V3 specs. Most of the web is slop like this now. Sigh.

zargon · 2026-04-24T04:08:37 1777003717

The Flash version is 284B A13B in mixed FP8 / FP4 and the full native precision weights total approximately 154 GB. KV cache is said to take 10% as much space as V3. This looks very accessible for people running "large" local models. It's a nice follow up to the Gemma 4 and Qwen3.5 small local models.

sbinnee · 2026-04-24T04:31:56 1777005116

Price is appealing to me. I have been using gemini 3 flash mainly for chat. I may give it a try.

input: $0.14/$0.28 (whereas gemini $0.5/$3)

Does anyone know why output prices have such a big gap?

girvo · 2026-04-24T05:56:27 1777010187

Output is what the compute is used for above all else; costs more hardware time basically than prompt processing (input) which is a lot faster

tokenmaxxinej · 2026-04-24T06:24:09 1777011849

input tokens are processed at 10-50 times the speed of output tokens since you can process then in batches and not one at a time like output tokens

regularfry · 2026-04-24T11:10:07 1777029007

I'm going to blow my bandwidth allowance again this month, aren't I.