Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> LLaMA can apparently run quantized to 4 bits per param (not sure if worth it though)

From the GPTQ paper https://arxiv.org/abs/2210.17323:

"... with negligible accuracy degradation relative to the uncompressed baseline"



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: