There are no hard rules regarding quants, except less is better. However models ...

There are no hard rules regarding quants, except less is better.

However models respond very differently, and there are tricks you can do like limiting quantization of certain layers. Some models can genrally behave fine down into sub-Q4 territory, while others don't do well below Q8 at all. And then you have the way it was quantized on top of that.

So either find some actual benchmarks, which can be rare, or you just have to try.

As an example, Unsloth recently released some benchmarks[1] which showed Qwen3.5 35B tolerating quantization very well, except for a few layers which was very sensitive.

edit: Unsloth has a page detailing their updated quantization method here[2], which was just submitted[3].

[1]: https://unsloth.ai/docs/models/qwen3.5/gguf-benchmarks

[2]: https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs

[3]: https://news.ycombinator.com/item?id=47192505