More

bildung · 2026-04-16T15:17:10 1776352630

I currently run the qwen3.5-122B (Q4) on a Strix Halo (Bosgame M5) and am pretty happy with it. Obviously much slower than hosted models. I get ~ 20t/s with empty context and am down to about 14t/s with 100k of context filled.

No tuning at all, just apt install rocm and rebuilding llama.cpp every week or so.

bildung · 2026-04-16T15:08:33 1776352113

The privacy/data security angle really is important in some regions and industries. Think European privacy laws or customers demanding NDAs. The value of Anthropic and OpenAI is zero for both cases, so easy to beat, despite local models being dumber and slower.

bildung · 2026-04-16T15:01:50 1776351710

Bad QA :/ They had a bunch of broken quantizations in the last releases

danielhanchen · 2026-04-16T15:13:52 1776352432

1. Gemma-4 we re-uploaded 4 times - 3 times were 10-20 llama.cpp bug fixes - we had to notify people to upload the correct ones. The 4th is an official Gemma chat template improvement from Google themselves.

2. Qwen3.5 - we shared our 7TB research artifacts showing which layers not to quantize - all provider's quants were under optimized, not broken - ssm_out and ssm_* tensors were the issue - we're now the best in terms of KLD and disk space

3. MiniMax 2.7 - we swiftly fixed it due to NaN PPL - we found the issue in all quants regardless of provider - so it affected everyone not just us. We wrote a post on it, and fixed it - others have taken our fix and fixed their quants, whilst some haven't updated.

Note we also fixed bugs in many OSS models like Gemma 1, Gemma 3, Llama chat template fixes, Mistral, and many more.

Unfortunately sometimes quants break, but we fix them quickly, and 95% of times these are out of our hand.

We swiftly and quickly fix them, and write up blogs on what happened. Other providers simply just take our blogs and fixes and re-apply, re-use our fixes.

bildung · 2026-04-16T15:21:06 1776352866

Fair enough, appreciate the detailed response! Can you elaborate why other quantizations weren't affected (e.g. bartowski)? Simply because they were straight Q4 etc. for every layer?

danielhanchen · 2026-04-16T15:26:44 1776353204

No Bartowski's are more affected - (38% NaN) than ours (22%) - for MiniMax 2.7 see https://www.reddit.com/r/LocalLLaMA/comments/1slk4di/minimax...

We already fixed ours. Bart hasn't yet but is still working on it following our findings.

blk.61.ffn_down_exps in Q4_K or Q5_K failed - it must be in Q6_K otherwise it overflows.

For the others, yes layers in some precision don't work. For eg Qwen3.5 ssm_out must be minimum Q4-Q6_K.

ssm_alpha and ssm_beta must be Q8_0 or higher.

Again Bart and others apply our findings - see https://www.reddit.com/r/LocalLLaMA/comments/1rgel19/new_qwe...

bildung · 2026-04-16T15:34:00 1776353640

Thanks again, TIL

danielhanchen · 2026-04-16T15:52:12 1776354732

Thanks!

rohansood15 · 2026-04-16T15:41:14 1776354074

Thanks for all the amazing work Daniel. I remember you guys being late to OH because you were working on weights released the night before - and it's great to see you guys keep up the speed!

danielhanchen · 2026-04-16T15:52:06 1776354726

Oh thanks haha :) We try our best to get model releases out the door! :) Hope you're doing great!

bildung · 2026-04-01T13:29:28 1775050168

To add to that, the current take that the US could just walk away from the conflict is incredibly naive - Iran will decide when this is over, and it won't be before the November elections. Before the US attacked, blocking the strait was only a potential, now Trump gave Iran the chance to prove that they are capable of doing it. And why on earth would Iran now give that away for free?

bildung · 2026-04-01T12:11:50 1775045510

In Germany there was zero investment into the electric infrastructure, but the power allowed to flow from the panels into the grid is currently limited to 800W for this type of system. Seems to work fine. Larger systems still need a license.

bildung · 2026-02-19T09:26:23 1771493183

They paid about $10B on inference and had about $10B in revenue in 2025. The users and numbers of zeroes on those numbers are not relevant. What is relevant is the ratio of those numbers. They apparently are not even profitable on inference, wich is the cheap part of the whole business.

And cost of inference tripled from $3B in 2024 to $10B in 2025, so cost of revenue linearly grows with number of users, i.e. it does not get cheaper.

https://www.wheresyoured.at/oai_docs/

bildung · 2026-02-19T09:15:18 1771492518

Of course they bundle R&D with inference pricing, how else could you the recoup that investment.

The interesting question is: In what scenario do you see any of the players as being able to stop spending ungodly amounts for R&D and hardware without losing out to the competitors?

stavros · 2026-02-19T10:07:53 1771495673

In the scenario where that market collapses, ie when we stop making significant gains with new models. It might be a while, though, who knows.

bildung · 2026-02-18T11:48:38 1771415318

But only if you ignore all the other market participants, right? How can we ever reach a point where all the i.e. smaller Chinese competitors perpetually trailing behind SOTA with a ~9 month lag but at a tiny fraction of the cost stop existing?

I mean we just have to look at old discussions about Uber for the exact same arguments. Uber, after all these years, still is at a negative 10 % lifetime ROI , and that company doesn't even have to meaningfully invest in hardware.

IMO this will probably develop like the railroad boom in the first half of the 19th century: All the AI-only first movers like OpenAI and Anthropic will go bust, just like most railroad companies who laid the tracks, because they can't escape the training treadmill. But the tech itself will stay, and even become a meaningful productivity booster over the next decades.

viking123 · 2026-02-18T17:31:47 1771435907

I am also thinking long term where is the moat if it will inevitably lead to price competition? Like it's not a Microsoft product suite that your whole company is tied in multiple ways. LLMs can be quite easily swapped to another.

bildung · 2026-02-05T08:43:41 1770281021

Further down:

> there was a single brain region where we saw that higher cannabis use was actually associated with lower brain volume – the posterior cingulate, which is part of the limbic system and is implicated in processes like memory, learning, and emotion. That said, some research suggests smaller posterior cingulate volume is actually associated with better working memory, so it’s a little unclear what this means.

emsign · 2026-02-05T09:42:10 1770284530

This is the second user who didn't read the article completely. Not using cannabis obviously shows its negative effects here. QED

bildung · 2026-02-04T08:12:04 1770192724

1 KW of solar panels is 150€ retail right now. You are probably at 80€ or less if you buy a few MW.

(I'm ignoring installation costs etc. because actually creating the satellites is ignored here, too)

tpm · 2026-02-04T08:23:42 1770193422

installation of large solar plants is largely automated already