I would really like to have something local, without running into ChatGPT limits...

MacsHeadroom · on April 22, 2023

The best option right now is SuperCOT 30B, a LoRA for LLaMA trained on Chain-of-Thought coding questions and answers. You will need at least 24GB of VRAM to load the 4bit GPTQ quantized LoRA merged model* locally. https://huggingface.co/tsumeone/llama-30b-supercot-4bit-128g...

*The merged model contains the LoRA. Applying the LoRA over a "raw" llama model uses more VRAM and does not allow for full context in 24GB of VRAM.