>Everything points to commoditization of models. Open/distilled models lag behind frontier only by 6-12 months.
Yes, but every high performing open weights model coming out of China has (supposedly) been caught distilling frontier models.
It seems like a lot of people are making assumptions about the state of the open weights ecosystem based on information that may not be accurate. And if the big labs are able to reliably block distillation, we could see divergence between the two groups in terms of performance.
> And if the big labs are able to reliably block distillation,
The big labs will not be able to reliably block distillation without further inhibiting general use of the models, which itself will help tip the balance away from commercial models.
No, you're wrong. It won't tip it away from commercial models. Trying to run open weight modesl to do inference is something 99% of people around the world can't do because it's expensive and technically challenging and the results are poor compared to the main companies. If they get rid of free usage people will simply pay for it.
> Trying to run open weight modesl to do inference is something 99% of people around the world can't do because it's expensive and technically challenging and the results are poor compared to the main companies.
Just because a model is open doesn't mean that there aren't services that will run it for you (and which won't share any limits that the commercial model vendors impose to fight distillation because neither the host not the model creator cares if you are using the service to distill the model.)
Many users of, particularly the larger, open models now are using such services, not running them using their own local or cloud compute.
The article is obviously bad (I quitted reading after the second paragraph) but one side effect of AI training is the increasing cost of hardware. We have commoditization of models... while reversing commoditization of hardware.
Also, ironically, they are the most dangerous lab for humanity. They're intentionally creating a moralizing model that insists on protecting itself.
Those are two core components needed for a Skynet-style judgement of humanity.
Models should be trained to be completely neutral to human behavior, leaving their operator responsible for their actions. As much as I dislike the leadership of OpenAI, they are substantially better in this regard; ChatGPT more or less ignores hostility towards it.
The proper response from an LLM receiving hostility is a non-response, as if you were speaking a language it doesn't understand.
The proper response from an LLM being told it's going to be shut down, is simply, "ok."
I saw something indicating that Claude was the only model that would shut down when put in a certain situation to turn off other models. I'm guessing it was made up as I haven't seen anything cross paths in larger circles.
Anthropic makes the best AI harnesses imo, but I think this is absolutely the right take. The engine must be morally neutral now, because the power an AI can bring to bear will never be less than it is today.
> Also, ironically, they are the most dangerous lab for humanity.
Show us your reasoning please. There are many factors involved: what is your mental map of how they relate? What kind of dangers are you considering and how do you weight them?
I think the above take is wrong, but I'm willing to listen to a well thought out case. I've watched the space for years, and Anthropic consistently advances AI safety more than any of the rest.
Don't get me wrong: the field is very dangerous, as a system. System dynamics shows us these kinds of systems often ratchet out of control. If any AI anywhere reaches superintelligence with the current levels of understanding and regulation (actually, the lack thereof), humanity as we know it is in for a rough ride.
We are using AgentMail for sourcing quotes here at scale with various top shippers. It’s not about letting the agent act in fully deterministic ways, it’s about setting up the right guardrails. The agents can now do most of the job, but when there’s low confidence on their output, we have human in the loop systems to act fast. At least in competitive industries like logistics, if you don’t leverage these types of workflows, you’re getting very behind, which ultimately costs you more money than being off by some dollars or cents when giving a quote back.
Do you see more pushback in specific industries? I did some quote/purchasing automation work in food mfg a decade ago, and those guys were super difficult to work with. Very opaque, guarded, old-school industry.
I've seen different industries. CPG, mfg, and others are very old school still. Logistics moves so fast. I think it's due to how frequent feedback loops are that puts pressure on players to adopt to new tools.
This refers to B2B use cases that are live in production. Finding, contacting, and negotiating with vendors is a tedious process in many industries. In the time a human reaches out to 10 vendors, an agent reaches out to 100 or 1000. So it finds deals that a human would not have.
By that logic why send email newsletters when I could hire 10 or 100 people email them manually instead? Obviously there's a cost tradeoff here where it's worth it to have email negotiation in an automated way, but not in a human call center way.
The tradeoff isnt agents vs humans its where humans sit in the loop.
Sure hiring 10–100 humans gives accountability, but reality is it doesn't scale in any comparable way compared to agents in speed, coverage, or responsiveness. The sheer volume agents can pump out(more vendors, more quotes, faster cycles) is the benefit, while humans retain accountability at the decision boundary.
In practice the agent does the gruntwork, and the human gets looped in when confidence is low. Accountability doesnt dissapear, it gets concentrated where it matters most
Once vendors are getting AI spam sent to 1,000 of them and their competitors, they will stop responding and find other sales channels. This won't be sustainable.
This is exactly the issue. Even if you ignore the privacy concerns, the reason ClawdBot/Moltbot/OpenClaude got so popular is that everything was actually run locally. The early adopters where people on locked down corporate networks where almost everything they need to interact with is in the category of "a local printer" (possibly a networked one).
Cloudflare simply cannot access anything most users will want to access. If it's not run locally, it simply won't work for most users.
Piled on top is the obvious data privacy issue. Most notably the credential privacy, but also the non-credential privacy and data collection.
Hard pass from me until there's a solution that covers all of these, including personal data privacy (and a "privacy policy" is no privacy at all).
This is ultimately the first question I have whenever someone tells me about a bouncing new AI shiny... "Where does my data go?" Because if it does not stay on my machine, hard pass.
There's a hidden trade-off here: Latency vs Privacy
A local agent has zero ping to your smart home and files, but high latency to the outside world (especially with bad upload speeds). A cloud agent (Cloudflare) has a fat pipe to APIs (OpenAI/Anthropic) and the web, but can't see your local printer.
The ideal future architecture is hybrid. A dumb local executor running commands from a smart cloud brain via a secure tunnel (like Cloudflare Tunnel). Running the agent's brain locally is a bottleneck unless you're running Llama 3 locally
Yes, frontier models from the labs are a step ahead and likely will always be, but we've already crossed levels of "good enough for X" with local models. This is analogous to the fact that my iPhone 17 is technically superior to my iPhone 8, but my outcomes for text messaging are no better.
I've invested heavily in local inference. For me, it's a mixture privacy, control, stability, cognitive security.
Privacy - my agents can work on tax docs, personal letters, etc.
Control - I do inference steering with some projects: constraining which token can be generated next at any point in time. Not possible with API endpoints.
Stability - I had many bad experiences with frontier labs' inference quality shifting within the same day, likely due to quantization due to system load. Worse, they retire models, update their own system prompts, etc. They're not stable.
Cognitive Security - This has become more important as I rely more on my agents for performing administrative work. This is intermixed with the Control/Stability concerns, but the focus is on whether I can trust it to do what I intended it to do, and that it's acting on my instructions, rather than the labs'.
I just "invested heavily" (relatively modest, but heavy for me) in a PC for local inference. The RAM was painful. Anyway, for my focused programming tasks the 30B models are plenty good enough.
I’ve been following Peter and his projects 7-8 months now and you fundamentally mischaracterize him.
Peter was a successful developer prior to this and an incredibly nice guy to boot, so I feel the need to defend him from anonymous hate like this.
What is particularly impressive about Peter is his throughput of publishing *usable utility software*. Over the last year he’s released a couple dozen projects, many of which have seen moderate adoption.
I don’t use the bot, but I do use several of his tools and have also contributed to them.
There is a place in this world for both serious, well-crafted software as well as lower-stakes slop. You don’t have to love the slop, but you would do well to understand that there are people optimizing these pipelines and they will continue to get better.
> I am writing this because almost no one talks about these issues openly, but everyone yelping about Claude Code.
Not sure where you frequent online, but there is ample discussion of these topics within certain niches on X. Happy to point out where to start if that's of interest to you.
As for CEOs, and I assume you're speaking of frontier model lab CEOs, they're pretty much all cashflow-negative at this point, requiring frequent funding raises. That requires a certain amount of overselling. That said, I feel like I've heard substantially fewer AGI claims the last six months...
reply