I am a total neophyte when it comes to LLMs, and only recently started poking ar...

deepsquirrelnet · 2026-03-01T02:22:15 1772331735

4-bit quantization on newer nvidia hardware is being supported in training as well these days. I believe the gpt-oss models were trained natively in MXFP4, which is a 4-bit floating point / e2m1 (2-exponent, 1 bit mantissa, 1 bit sign).

It doesn't seem terribly common yet though. I think it is challenging to keep it stable.

[1] https://www.opencompute.org/blog/amd-arm-intel-meta-microsof...

[2] https://www.opencompute.org/documents/ocp-microscaling-forma...

zozbot234 · 2026-03-01T17:53:45 1772387625

mxfp4 is a block-based floating point format. The E2M1 format applies to individual values, but each 32-values block also has a shared 8-bit floating point exponent to provide scaling information about the whole block.

regularfry · 2026-03-01T14:07:18 1772374038

There's also work on ternary models that's quite interesting, because the arithmetic operations are super fast and they're extremely cache efficient. Well worth looking into if that's the sort of thing that interests you.

silisili · 2026-03-01T02:34:23 1772332463

Mind sharing any resources? I've been thinking about trying to understand them better myself.

jackcosgrove · 2026-03-01T02:55:36 1772333736

This is an ongoing course at CMU you can shadow.

https://modernaicourse.org/

tymscar · 2026-03-01T02:06:38 1772330798

Thats cool.

I do wonder where that extra acuity you get from 1% more shows up in practice. I hate how I have basically no way to intuitively tell that because of how much of a black box the system is

doctorpangloss · 2026-03-01T02:08:39 1772330919

Well why would Claude know any of this? Obviously it's the wrong criteria. If you have your own dataset to benchmark, created your own calibration for quantization with it. Scientifically, you wouldn't really believe in the whole process of gradient descent if you didn't think tiny differences in these values matter. So...

tymscar · 2026-03-01T02:18:12 1772331492

I think you might be answering to a different person or misunderstanding what I said but you are right that just as I don’t have an intuition for where the acuity shows up in the corpus, I don’t think Claude does either