If tomorrow Claude pricing changes and they start charging real API costs like 2000+ USD, and there is another service: "NotReallyClaude" that is a bit less good but 200 USD, then what would you do ?
The massive DC overbuild matches demand, prices normalise somewhat in 3-5 years.
The massive DC overbuild does not match demand, prices tank in 3-5 years.
Third possibility: some approach like Taalas renders the current storyline meaningless. Would put 3 in 10 odds of this happening but I'd looove to see it.
Fourth: entire planet gets profoundly sick of emdashes, we all move back into caves and live in eternal gratitude of the moment humanity woke up to how little all of this really matters.
set Anthropic base URL in CC to your proxy server and map each model to your preferred models (I keep opus↔opus but technically you can do opus↔gpt-5.3, etc.). then check the incoming messages for the string that triggers compaction (it's a system prompt btw) and modify that message before it hits the LLM server.
I do like the idea of an aftermarket of ancient LLM chips that still have tons of useful life on text processing tasks etc. They don't talk about their architecture much, I wonder how well power can scale down. 200W for such a small model is not something I see happening in a laptop any time soon. Pretty hilarious implications for moat-building of the big providers too.
Yea I mean this is the first publishable draft of a startup cooking on this.
I'm confident there are at least 1-2 OOMs of improvement to come here in terms of the (intelligence : wattage) ratio.
I really thought we were going to need to see a couple of dramatic OOM-improvement changes to the model composition / software layer, in order to get models of Opus 3.7's capability running on our laptops.
This release tells me that eventual breakthrough won't even be strictly necessary, imo.
The way I imagine it in 2-4 years we're going to be hit with a triple glut of better architecture, massive oversupply of hardware and potentially one or two hardware efforts like this really taking off. It's pretty crazy we're already 4 years in and outside of very niche / low availability solutions, it's still either GPU or bust
That's interesting! How do you see "oversupply of hardware" playing out?
Is it because we stop doing ~2024-style, large-scale training (marginal returns aren't worth it)? Or because supply way outpaces the training+inference demand?
AFAIU if the trend lines /S-curves keep chugging along as they are, we won't hit hardware oversupply for a long, long time without some sort of AI training winter.
One of these things, however old, coupled with robust tool calling is a chip that could remain useful for decades. Baking in incremental updates of world knowledge isn't all that useful. It's kinda horrifying if you think about it, this chip among other things contains knowledge of Donald Trump encoded in silicon. I think this is a way cooler legacy for Melania than the movie haha.
reply