Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've worked on ML model quantization. The open source 4-bit or 8-bit quantization isn't as good as one can get - there are much fancier techniques to keep predictive performance while squeezing size.

Some techniques (like quantization-aware training) involve changes to training.



I'm sure there are better methods! But in this case, MKML's numbers just don't look impressive when placed alongside the prominent quantization techniques already in use. According to this chart [0] it's most similar in size to a Q6_K quantization, and if anything has slightly worse perplexity.

If their technique were better, I imagine that the company would acknowledge the existence of the open source techniques and show them in their comparisons, instead of pretending the only other option is the raw fp16 model.

[0] https://old.reddit.com/r/LocalLLaMA/comments/142q5k5/updated...


From what I remember, non-power-of-2 compression schemes tank inference speed (assuming Q6_k is 6-bit; I haven't actually verified if ggml q6_K llama is slow). Meanwhile, the site claims a speed-up.

But I do actually agree with you - they should really be benchmarking against popular competitors. In my experience, fancier quantization is a _lot_ of work for fairly little gain (at least for neural nets). I also think that ML techniques such as quantization (or fancy param sweeps, feature pruning, that kind of stuff) tend to either get in-housed (i.e. the model will come quantized from the source) or get open-sourced.

In-housing of ML techniques tends to happen more often if there's a money-making model where the hardware running the model costs money, but running the model brings in money.


What about Unum's quantization methods?

https://github.com/unum-cloud/usearch


Not familiar with Unum. From a quick glance, it seems that they truncate Least Significant Bits, which is the simplest but fastest quantization method.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: