More

throwaway2027 · 2026-04-12T15:12:56 1776006776

I don't want a nudge. I want a clear RED WARNING with "You've gone away from your computer a bit too long and chatted too much at the coffee machine. You're better off starting a new context!"

bcherny · 2026-04-12T15:20:44 1776007244

Ack, it is currently blue but we can make it red

SpaceNoodled · 2026-04-12T16:40:21 1776012021

Why is nobody even asking why that should be an issue? No other text editor shits the bed that way. The whole point of the computer is that it patiently waits for my input.

GeoAtreides · 2026-04-12T16:47:40 1776012460

let me put this way: not your ram, not your cache, not waiting patiently for your input.

throwaway2027 · 2026-04-12T14:47:40 1776005260

Some claim that some of the recent smaller local models are as good as Sonnet 4.5 of last year and the bigger high-end models can be as almost as good as Claude, Gemini and Codex today, but some say they're benchmaxed and not representative.

To try things out you can use llama.cpp with Vulkan or even CPU and a small model like Gemma 4 26B-A4B or Gemma 4 31B or Qwen 3.5 35-A3B or Qwen3.5 27B. Some of the smaller quants fit within 16GB of GPU memory. The default people usually go with now is Q4_K_XL, a 4-bit quant for decent performance and size.

https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF

https://huggingface.co/unsloth/gemma-4-31B-it-GGUF

https://huggingface.co/unsloth/Qwen3.5-35B-A3B-GGUF

https://huggingface.co/unsloth/Qwen3.5-27B-GGUF

Get a second hand 3090/4090 or buy a new Intel Arc Pro B70. Use MoE models and offload to RAM for best bang for your buck. For speed try to find a model that fits entirely within VRAM. If you want to use multiple GPUs you might want to switch to vLLM or something else.

You can try any of the following models:

High-end: GLM 5.1, MiniMax 2.7

Medium: Gemma 4, Qwen 3.5

https://unsloth.ai/docs/models/minimax-m27

https://unsloth.ai/docs/models/glm-5.1

https://unsloth.ai/docs/models/gemma-4

https://unsloth.ai/docs/models/qwen3.5

https://github.com/ggml-org/llama.cpp

weavie · 2026-04-12T15:40:28 1776008428

Thank you, I'll look into it. For someone who is used to just working with second hand thinkpads, this stuff gets expensive fast!

throwaway2027 · 2026-04-12T14:24:17 1776003857

They rolled out 1M context then they start doing this shit? I know Pro doesn't have access to the 1M context but what a joke.

throwaway2027 · 2026-04-12T14:09:09 1776002949

Mistral isn't that great. Deepseek was good when they first had thinking. But most people just try something out and if that doesn't work on that model then it's bad and for Claude and Codex and Gemini they just are that much better now, but if they quantize or cut limits they destabilize and you're right you might as well just use something worse but reliable.

hirako2000 · 2026-04-12T14:14:42 1776003282

I regularly compare models. You are right Deepseek was more impressive when the latest came out. But since then they accepted to slow down throughout and keep the same quality.

I often compare with Gemini. Sure those Google servers are super fast. But I can't see it better. Qwen and deepseek simply work better for me.

Haven't tested Mistral in a while, you may be right.

People try out and feel comfortable: using U.S models (I can see the logic), but mostly for brand recognition. Anthropic and OpenAi are the best aren't they? When the models jam they blame themselves.

throwaway2027 · 2026-04-12T13:42:02 1776001322

Claude is worse, they don't tell you when your experience has degraded and don't even let you use worse models if you run out any.

throwaway2027 · 2026-04-12T10:09:14 1775988554

It's absolutely ridiculous how stupid Claude is now. I sometimes notice it and last year too but it feels like it's just last year before December model.

config_yml · 2026-04-12T12:53:41 1775998421

Feels similar to Claude last August/September. Knowing Claude some Agent probably reverted the fix from back then ^^

https://www.anthropic.com/engineering/a-postmortem-of-three-...

throwaway2027 · 2026-04-12T09:35:02 1775986502

I also noticed this, just resuming something eats up your entire session. The past two weeks also felt like a substantial downgrade and made me regret renewing my subscription, it sucks because I wish I kept my Codex subscription instead and renewed that.

beering · 2026-04-12T19:18:00 1776021480

Are you locked into your current subscription?

throwaway2027 · 2026-04-03T15:08:42 1775228922

Is there any benefit in edge cases to using big-endian these days?

zephen · 2026-04-03T18:37:26 1775241446

Well, blogging about how it's important can certainly give insight to others about the age of your credentials, just in case repeatedly shouting "Get off my lawn!" didn't suffice.

throwaway2027 · 2026-04-01T16:09:17 1775059757

I noticed it last December.

BatFastard · 2026-04-01T19:04:58 1775070298

December for me too, went from hallucinating a lot, to doing really well. I guess it was opus 4.5?

no_shadowban_6 · 2026-04-01T16:51:11 1775062271

paradigm shift, bro.

throwaway2027 · 2026-04-01T15:22:49 1775056969

I just switch between Gemini, Codex, Claude, Z.AI, ... whichever offers the best value.

csiegert · 2026-04-01T19:15:29 1775070929

How does that work? Is there a VSCode extension that works with all of them? I’ve only used the Claude Code extension for VSCode and would prefer something like that.