“Zero point” is how I saw it referred to in the literature, so that’s what I went with. I personally prefer to think of it as an offset, but I try to stick with terms folks are likely to see in the wild.
Definitely could be, but in the time I spent talking to the 4-bit models in comparison to the 16-bit original it seemed surprisingly capable still. I do recommend benchmarking quantized models at the specific tasks you care about.
Thank you! I was really surprised how robust models are to losing information. It seems wrong that they can be compressed so much and still function at all, never mind function quite closely to the original size.
Think we're only going to keep seeing more progress in this area on the research side, too.
In my personal opinion I don’t think the 1.58 bit work is going to make it into the mainstream.
Not sure why you think fractional representations are only useful for training? Being able to natively compute in lower precisions can be a huge performance boost at inference time.
I wrote a tool called llmwalk (https://github.com/samwho/llmwalk) that’ll deterministically show you the likelihood the top N answers are for a given open model and prompt. No help on frontier models, but maybe helpful if you want to run a similar analysis more quickly on open models!
Something you don’t really mention in the post is why do this? Do you have an end goal or utility in mind for the book shelf? Is it literally just to track ownership? What do you do with that information?
reply