> LLaMA can apparently run quantized to 4 bits per param (*not sure if worth it ...

		Tepix on May 31, 2023 \| parent \| context \| favorite \| on: Nvidia DGX GH200: 100 Terabyte GPU Memory System > LLaMA can apparently run quantized to 4 bits per param (not sure if worth it though) From the GPTQ paper https://arxiv.org/abs/2210.17323: "... with negligible accuracy degradation relative to the uncompressed baseline"