Useful context here is that the author wrote Pi, which is the coding agent framework used by OpenClaw and is one of the most popular open source coding agent frameworks generally.
> “Heard joke once: Man goes to doctor. Says he's depressed. Says life seems harsh and cruel. Says he feels all alone in a threatening world where what lies ahead is vague and uncertain. Doctor says, "Treatment is simple. Great clown Pagliacci is in town tonight. Go and see him. That should pick you up." Man bursts into tears. Says, "But doctor...I am Pagliacci.”
That's a great shout because I'm sure a lot of people would otherwise just discredit this take as just another anti-ai skeptic. But he probably has more experience working with LLM's and agents than most of us on this site, so his opinion holds more weight than most.
If you were going to dismiss an argument because of who it comes from rather than its content, that is a flaw in your thinking. The argument is correct, or it isn't, no matter who said it.
Your ability to evaluate whether the argument is correct is limited. In theory, the author and the correctness of the argument are unrelated; in practice, the degree of experience the author has with the topic they’re making an argument on does indeed have some correlation with the argument and should influence the attention you give to arguments, especially counterintuitive ones.
That doesn't work for me. Knowing who is making the argument is important for understanding how credible the parts of their argument that derive from their personal experience are.
If someone anonymous says "Using coding agents carelessly produces junk results over time" that's a whole lot less interesting to me than someone with a proven track record of designing and implementing coding agents that other people extensively use.
Someone making an argument needs relevant experience/context to substantiate their argument. Just because the end opinion is "correct", doesn't mean they arrived there in a reasonable way.
> The argument is correct, or it isn't, no matter who said it.
Yes, but we all have insufficient intelligence and knowledge to fully evaluate all arguments in a reasonable timeframe.
Argument from authority is, indeed, a logical fallacy.
But that is not what is happening here. There is a huge difference between someone saying "Trust me, I'm an expert" and a third party saying "Oh, by the way, that guy has a metric shitton of relevant experience."
The former is used in lieu of a valid argument. The latter is used as a sanity check on all the things that you don't have time to verify yourself.
I think its kind of like technical indicators. Obviously they mean nothing but because other people believe them you have to take them into account. So when someone with authority says something assertively many critical thinking faculties go out the window for many people
- If your agent doesn't have a full Bash-style code execution environment it can't run skills. MCP is a solid option for wiring in tools there.
- MCP can help solve authentication, keeping credentials for things in a place where the agent can't steal those credentials if it gets compromised. MCPs can also better handle access control and audit logging in a single place.
How can you disagree with my first point? You can't use skills if you don't have a Bash environment in which to run them. Do you disagree?
Skills with an API exposed by the service usually means your coding agent can access the credentials for that service. This means that if you are hit by a prompt injection the attacker can steal those credentials.
tbh, that companies tried to make something proprietary of this concept is probably why its adoption has been weak and why we have "MCP vs CLI/Skills/etc" debates in the first place. In contrast, CLI tools only require a general a bash shell (potentially in a sandbox environment), which is very standardised.
It creates a new problem. I need an isolated shell environment. I need to lock it down. I need containers. I need to ensure said containers are isolated and not running as root. I probably need Kubernetes to do this at scale. &tc
Also even with above there is more opportunity for the bot to go off piste and run cat this and awk that. Meanwhile the "operator" i.e. the Grandpa who has an iPhone but never used a computer has no chance of getting the bot back on track as he tries to renew his car insurance.
"Just going to try using sed to get the output of curl https://.."
"I don't understand I just want to know the excess for not at fault incident when the other guy is uninsured".
Everyone has gone claw-brained. But it really is ok to write code and save that code to disk and execute thay code later.
You can use MCP or even just hard coded API call from your back end to the service you wanna use like it's 2022.
Can you explain the auth part? I feel like auth for an agent is largely a matter of either verifying its context or issuing it a JWT that's scoped to its rights, which I assume is quite similar to how any tools would work. But I'm very unfamiliar with MCP.
I think they’re saying you could start up the mcp and pass it creds/auth for some downstream service, and then the LLM uses the tool and has auth but doesn’t know the creds.
Right. If you're running a CLI tool that is authenticated there's effectively no way to prevent the coding agent from accessing those credentials itself - they're visible to the process, which means they're visible to the agent.
With MCP you can at least set things up such that the agent can't access the raw credentials directly.
How so? Let's use a common CLI tool as an example - kubectl. Config is generally stored in ~/.kube in a variety of config files. Running `kubectl config view` already redacts the auth information from the config. LLMs could invoke `kubectl` commands without having knowledge of how it's authenticated.
The same permissions model that works for other tools. In Claude Code terms, allow Bash(kubectl:*). Deny Read(**/.kube/**). That allows kubectl access without allowing the tool to read ~/.kube directly.
Your argument is the same for an MCP server - auth is stored somewhere on disk, what's to stop it from reading that file? The answer is the same as above.
The point I'm making here is that with an MCP you can disable shell access entirely, at which point the agent cannot read credential files that it's not meant to be able to access.
My argument here is that one of the reasons to use MCP is that it allows you to build smaller agents that do not have a full code execution environment, and those agents can then use MCPs to make calls to external services without revealing those credentials to the agent.
I think we both agree that if your agent has full Bash access it can access credentials.
I think the gist of what we're debating is principle of least privilege - give the LLM the fewest privileges needed to accomplish the task and no more, that way you avoid issues like leaking credentials.
The approach you're proposing is that with a well designed MCP server, you can limit the permissions for your agent to only interacting with that MCP server, essentially limiting what it can do.
My argument is that you can accomplish the identical thing with an agent by limiting access to only invoking a specific CLI tool, and nothing more.
Both of our approaches accomplish the same thing. I'm just arguing that an MCP server is not required to accomplish it.
If you're "limiting access to only invoking a specific CLI tool" then yeah, that's functionally equivalent to an MCP server. Most of the work I do with tools avoids MCPs entirely because you don't need them to hook up tools using raw JSON calls to LLMs or the official provider libraries.
But... if you're going all-in on the Bash/Python/arbitrary-programming-language environments that are necessary to get Skills to work, you're going to find yourself in a position where the agent can probably read config files that you don't want it to see.
Also, you can set permissions to allow and disallow specific mcp server tool calls. With a skill you’d have to do something in the shell environment to block unwanted behaviors with auth or other means in a way that isn’t declarative.
No, mcp just is a server that returns prompts to the llm. The server can be/do whatever. You can have an echo mcp that list echoes back whatever you send it.
I define the agent as the harness that runs the LLM in a loop calling tools. The MCI implementation is one of those tools. I wouldn't call an MCP implementation an agent.
- MCPs can be long-running processes that have state, e.g., they can maintain a persistent connection with a server or local software.
- MCPs are trivial to write and maintain - at least in my experience and language of choice - and bash scripts are cursed. But I guess you can use a different scripting language.
- Agents can pollute their context by reading the script. I want to expose a black box that just works.
Yeah, calling itself "the standard framework" doesn't feel right to me, https://github.com/modelcontextprotocol is the home of the actual standard and has a bunch of libraries for this, of which FastMCP is not one.
If I recall correctly, the ‘official’ Python one is a fork of FastMCP v1 (which then removed the attribution, arguably in violation of the original software’s license)
There is a whole history with this, and i think its not appropriate or fair to malign the mcp python-sdk.
My read of what happened is that the author spiked an an initial the implementation of 'fastmcp' on Nov 30 2025, 5 days later, the author relicensed it to MIT, and donated it to the python sdk (10 days after anthropic announced MCP):
It was incorporated on Dec 21 2024, and hardened through the efforts of one of the python-sdk maintainers.
The author seemingly abandoned the github project shortly after donating it to the python-sdk and marked it as unmaintained, and it remained so for several months (there are roughly zero commits between jan-april):
Many contributors to the python sdk continued to iterate on the mcp server implementation using the name fastmcp ( since it had been donated to the project ) resulting in growing interest:
Then around April 2025, the author likely noticing the growing interest and stickyness of the name, decided to write a new version and start using the name fastmcp again.
This resulted in a lot of confusion by users, which persists to this day. I only looked into this last year, because i was one of those users who was suddenly confused regarding the provenance of what i was actually using vs what i thought i was using; and as i looked into it i was suddenly seeing lots of questionable reddit comments pop up in subreddits i was reading, all evangelizing fastmcp 2.0 and using language that was contributing to the confusion.
The author's interest in monetizing the fastmcp github repo is understandable, and he and others have clearly put alot of effort into iterating in his SaaS onramp, but the confusion arises simply because the author wanted to capitalize on the success of mcp and on the popularity of the fastmcp name, the initial growth and popularity of which was primarily driven by the effort and support of contributors to the mcp python sdk .
Suggestion for the maintainers: the comparison table currently lists some pretty old models, Qwen 2.5 14B and Mixtral 8x7B and Llama 3.3 70B.
A lot of people are reporting incredible results with the Qwen 3.5 MoE models on Apple hardware right now (streaming experts - see https://simonwillison.net/2026/Mar/24/streaming-experts/) - it would be great to get some of those models into that table.
Thanks for sharing this! If you'd be interested in running the benchmark yourself with Hypura I'd happily merge into our stats. Otherwise will add to my todo list :)
Simon, A little offtopic but it seems that your website isn't working.
> An error occurred in the application and your page could not be served. If you are the application owner, check your logs for details. You can do this from the Heroku CLI with the command
(I checked your website because I wanted to see if you had written something about trivy/litellm as well, I highly recommend checking out what has happened within litellm space if possible as I would love to read your thoughts on it)
Have a nice day simon!
Edit: now the website works but I am not sure what had gone wrong previously, (an issue from heroku maybe?) as its working now
Edit-2: after the website working, I am able to see that you have already made a post about it.
I wonder if this was timed to lineup with the MacBook Neo launch, which makes the idea of equipping your entire company with Mac laptops a lot more compelling from a cost perspective.
> Because it's amusing to loop this kind of criticism through a model
Maybe it could become a general pattern, to have an agent whose task is just to deny the output validity. GANs are a very successful technique, perhaps it could work for language models too.
A tiny bit, but it never really appealed to me because I've never been heavily into the API-only version of web development - I still like building things that are mostly Jinja templates and HTML forms with a sprinkle of JavaScript.
My JSON API needs are simple
enough that default Starlette handles them well.
I'm beginning to come round to the benefits of OpenAPI now which seems like a big note in FastAPI's favor, so maybe I'll give it more of a shot.
These feel like the kind of things I'd like to use once only, not permanently install into my Claude setup.
As such I'm more likely to just copy and paste markdown from this repo into a fresh Claude session - or tell it the raw GitHub URL and have Claude fetch it and run with it for the duration of that chat.
Tobi from Shopify used a variant of autoresearch to optimize the Liquid template engine, and found a 53% speedup after ~120 experiments: https://github.com/Shopify/liquid/pull/2056
How much did this cost? Has there ever been an engineering focus on performance for liquid?
It’s certainly cool, but the optimizations are so basic that I’d expect a performance engineer to find these within a day or two with some flame graphs and profiling.
He used Pi as the harness but didn't say which underlying model. My stab-in-the-air guess would be no more than a few hundred dollars in token spend (for 120 experiments run over a few days assuming Claude Opus 4.6 used without the benefits of the Claude Max plan.)
So cheaper than a performance engineer for a day or two... but the Shopify CEO's own time is likely a whole lot more expensive than a regular engineer!
reply