I don't want a nudge. I want a clear RED WARNING with "You've gone away from your computer a bit too long and chatted too much at the coffee machine. You're better off starting a new context!"
Why is nobody even asking why that should be an issue? No other text editor shits the bed that way. The whole point of the computer is that it patiently waits for my input.
Some claim that some of the recent smaller local models are as good as Sonnet 4.5 of last year and the bigger high-end models can be as almost as good as Claude, Gemini and Codex today, but some say they're benchmaxed and not representative.
To try things out you can use llama.cpp with Vulkan or even CPU and a small model like Gemma 4 26B-A4B or Gemma 4 31B or Qwen 3.5 35-A3B or Qwen3.5 27B. Some of the smaller quants fit within 16GB of GPU memory. The default people usually go with now is Q4_K_XL, a 4-bit quant for decent performance and size.
Get a second hand 3090/4090 or buy a new Intel Arc Pro B70. Use MoE models and offload to RAM for best bang for your buck. For speed try to find a model that fits entirely within VRAM. If you want to use multiple GPUs you might want to switch to vLLM or something else.
Mistral isn't that great. Deepseek was good when they first had thinking. But most people just try something out and if that doesn't work on that model then it's bad and for Claude and Codex and Gemini they just are that much better now, but if they quantize or cut limits they destabilize and you're right you might as well just use something worse but reliable.
I regularly compare models. You are right Deepseek was more impressive when the latest came out. But since then they accepted to slow down throughout and keep the same quality.
I often compare with Gemini. Sure those Google servers are super fast. But I can't see it better. Qwen and deepseek simply work better for me.
Haven't tested Mistral in a while, you may be right.
People try out and feel comfortable: using U.S models (I can see the logic), but mostly for brand recognition. Anthropic and OpenAi are the best aren't they? When the models jam they blame themselves.
It's absolutely ridiculous how stupid Claude is now. I sometimes notice it and last year too but it feels like it's just last year before December model.
I also noticed this, just resuming something eats up your entire session. The past two weeks also felt like a substantial downgrade and made me regret renewing my subscription, it sucks because I wish I kept my Codex subscription instead and renewed that.
Well, blogging about how it's important can certainly give insight to others about the age of your credentials, just in case repeatedly shouting "Get off my lawn!" didn't suffice.
How does that work? Is there a VSCode extension that works with all of them? I’ve only used the Claude Code extension for VSCode and would prefer something like that.
reply