More

simonw · 2026-03-25T16:43:38 1774457018

Useful context here is that the author wrote Pi, which is the coding agent framework used by OpenClaw and is one of the most popular open source coding agent frameworks generally.

jimbokun · 2026-03-25T19:26:39 1774466799

> “Heard joke once: Man goes to doctor. Says he's depressed. Says life seems harsh and cruel. Says he feels all alone in a threatening world where what lies ahead is vague and uncertain. Doctor says, "Treatment is simple. Great clown Pagliacci is in town tonight. Go and see him. That should pick you up." Man bursts into tears. Says, "But doctor...I am Pagliacci.”

https://www.goodreads.com/quotes/141645-heard-joke-once-man-...

deadbabe · 2026-03-25T23:36:56 1774481816

Good joke.

badlogic · 2026-03-25T23:38:41 1774481921

you get me

sehugg · 2026-03-25T17:22:31 1774459351

That's hilarious. I've been following Mario since his work on libGDX and RoboVM.

His blog post on pi is here: https://mariozechner.at/posts/2025-11-30-pi-coding-agent/

andai · 2026-03-25T22:30:43 1774477843

For reference, the creator of OpenClaw has roughly the opposite philosophy:

https://steipete.me/posts/2025/shipping-at-inference-speed

slopinthebag · 2026-03-25T17:52:28 1774461148

That's a great shout because I'm sure a lot of people would otherwise just discredit this take as just another anti-ai skeptic. But he probably has more experience working with LLM's and agents than most of us on this site, so his opinion holds more weight than most.

bigstrat2003 · 2026-03-25T18:24:26 1774463066

If you were going to dismiss an argument because of who it comes from rather than its content, that is a flaw in your thinking. The argument is correct, or it isn't, no matter who said it.

roughly · 2026-03-25T18:29:29 1774463369

Your ability to evaluate whether the argument is correct is limited. In theory, the author and the correctness of the argument are unrelated; in practice, the degree of experience the author has with the topic they’re making an argument on does indeed have some correlation with the argument and should influence the attention you give to arguments, especially counterintuitive ones.

simonw · 2026-03-25T18:41:12 1774464072

That doesn't work for me. Knowing who is making the argument is important for understanding how credible the parts of their argument that derive from their personal experience are.

If someone anonymous says "Using coding agents carelessly produces junk results over time" that's a whole lot less interesting to me than someone with a proven track record of designing and implementing coding agents that other people extensively use.

pkilgore · 2026-03-25T19:23:26 1774466606

Appeal to authority, the logical fallacy, is not attempting to claim that authority is irrelevant or has zero signal whatsoever.

seattle_spring · 2026-03-25T19:14:34 1774466074

Someone making an argument needs relevant experience/context to substantiate their argument. Just because the end opinion is "correct", doesn't mean they arrived there in a reasonable way.

zephen · 2026-03-25T19:12:02 1774465922

> The argument is correct, or it isn't, no matter who said it.

Yes, but we all have insufficient intelligence and knowledge to fully evaluate all arguments in a reasonable timeframe.

Argument from authority is, indeed, a logical fallacy.

But that is not what is happening here. There is a huge difference between someone saying "Trust me, I'm an expert" and a third party saying "Oh, by the way, that guy has a metric shitton of relevant experience."

The former is used in lieu of a valid argument. The latter is used as a sanity check on all the things that you don't have time to verify yourself.

whattheheckheck · 2026-03-25T23:36:20 1774481780

I think its kind of like technical indicators. Obviously they mean nothing but because other people believe them you have to take them into account. So when someone with authority says something assertively many critical thinking faculties go out the window for many people

PaulHoule · 2026-03-25T16:44:53 1774457093

... people like that have a way of writing articles that don't seem to say anything at all.

simonw · 2026-03-24T20:33:15 1774384395

I know of two benefits to MCP over Skills:

- If your agent doesn't have a full Bash-style code execution environment it can't run skills. MCP is a solid option for wiring in tools there.

- MCP can help solve authentication, keeping credentials for things in a place where the agent can't steal those credentials if it gets compromised. MCPs can also better handle access control and audit logging in a single place.

simianwords · 2026-03-24T22:05:47 1774389947

I don't agree with either. Skills with an API exposed by the service solves both your problems.

The LLM can look at the OpenAPI spec and construct queries - I often do this pretty easily.

simonw · 2026-03-24T22:08:43 1774390123

How can you disagree with my first point? You can't use skills if you don't have a Bash environment in which to run them. Do you disagree?

Skills with an API exposed by the service usually means your coding agent can access the credentials for that service. This means that if you are hit by a prompt injection the attacker can steal those credentials.

ntonozzi · 2026-03-24T22:30:33 1774391433

Something like Cloudflare's Code Mode fixes both of these! No privileged bash environment, no VM necessary, no exposing credentials to the LLM.

As the article states, LLMs are fantastic at writing code, and not so good at issuing tool calls.

m11a · 2026-03-25T04:08:15 1774411695

Cloudflare's Code Mode is conceptually the same as Anthropic's Code Mode (https://www.anthropic.com/engineering/code-execution-with-mc...), or the various open source implementations that predate and postdate those blog posts.

tbh, that companies tried to make something proprietary of this concept is probably why its adoption has been weak and why we have "MCP vs CLI/Skills/etc" debates in the first place. In contrast, CLI tools only require a general a bash shell (potentially in a sandbox environment), which is very standardised.

simianwords · 2026-03-24T22:13:54 1774390434

Fair points, learned something new.

mememememememo · 2026-03-24T22:13:31 1774390411

It creates a new problem. I need an isolated shell environment. I need to lock it down. I need containers. I need to ensure said containers are isolated and not running as root. I probably need Kubernetes to do this at scale. &tc

Also even with above there is more opportunity for the bot to go off piste and run cat this and awk that. Meanwhile the "operator" i.e. the Grandpa who has an iPhone but never used a computer has no chance of getting the bot back on track as he tries to renew his car insurance.

"Just going to try using sed to get the output of curl https://.."

"I don't understand I just want to know the excess for not at fault incident when the other guy is uninsured".

Everyone has gone claw-brained. But it really is ok to write code and save that code to disk and execute thay code later.

You can use MCP or even just hard coded API call from your back end to the service you wanna use like it's 2022.

staticassertion · 2026-03-24T20:59:50 1774385990

Can you explain the auth part? I feel like auth for an agent is largely a matter of either verifying its context or issuing it a JWT that's scoped to its rights, which I assume is quite similar to how any tools would work. But I'm very unfamiliar with MCP.

monkpit · 2026-03-24T21:06:53 1774386413

I think they’re saying you could start up the mcp and pass it creds/auth for some downstream service, and then the LLM uses the tool and has auth but doesn’t know the creds.

simonw · 2026-03-24T21:45:00 1774388700

Right. If you're running a CLI tool that is authenticated there's effectively no way to prevent the coding agent from accessing those credentials itself - they're visible to the process, which means they're visible to the agent.

With MCP you can at least set things up such that the agent can't access the raw credentials directly.

dcherman · 2026-03-25T14:21:06 1774448466

How so? Let's use a common CLI tool as an example - kubectl. Config is generally stored in ~/.kube in a variety of config files. Running `kubectl config view` already redacts the auth information from the config. LLMs could invoke `kubectl` commands without having knowledge of how it's authenticated.

simonw · 2026-03-25T14:40:39 1774449639

If the agent has permission to run kubectl config view what's to stop it from reading those config files directly?

dcherman · 2026-03-25T14:53:39 1774450419

The same permissions model that works for other tools. In Claude Code terms, allow Bash(kubectl:*). Deny Read(**/.kube/**). That allows kubectl access without allowing the tool to read ~/.kube directly.

Your argument is the same for an MCP server - auth is stored somewhere on disk, what's to stop it from reading that file? The answer is the same as above.

simonw · 2026-03-25T16:17:31 1774455451

Would that stop Claude from executing this code:

  python -c '
  print(open("~/.kube/config.txt").read())
  '

The point I'm making here is that with an MCP you can disable shell access entirely, at which point the agent cannot read credential files that it's not meant to be able to access.

dcherman · 2026-03-25T16:28:29 1774456109

You can make the identical argument for the CLI tool. Allow kubectl, deny everything else.

simonw · 2026-03-25T17:28:06 1774459686

I don't understand.

My argument here is that one of the reasons to use MCP is that it allows you to build smaller agents that do not have a full code execution environment, and those agents can then use MCPs to make calls to external services without revealing those credentials to the agent.

I think we both agree that if your agent has full Bash access it can access credentials.

dcherman · 2026-03-25T18:03:06 1774461786

I think the gist of what we're debating is principle of least privilege - give the LLM the fewest privileges needed to accomplish the task and no more, that way you avoid issues like leaking credentials.

The approach you're proposing is that with a well designed MCP server, you can limit the permissions for your agent to only interacting with that MCP server, essentially limiting what it can do.

My argument is that you can accomplish the identical thing with an agent by limiting access to only invoking a specific CLI tool, and nothing more.

Both of our approaches accomplish the same thing. I'm just arguing that an MCP server is not required to accomplish it.

simonw · 2026-03-25T21:55:40 1774475740

If you're "limiting access to only invoking a specific CLI tool" then yeah, that's functionally equivalent to an MCP server. Most of the work I do with tools avoids MCPs entirely because you don't need them to hook up tools using raw JSON calls to LLMs or the official provider libraries.

But... if you're going all-in on the Bash/Python/arbitrary-programming-language environments that are necessary to get Skills to work, you're going to find yourself in a position where the agent can probably read config files that you don't want it to see.

steve-atx-7600 · 2026-03-25T13:22:37 1774444957

Also, you can set permissions to allow and disallow specific mcp server tool calls. With a skill you’d have to do something in the shell environment to block unwanted behaviors with auth or other means in a way that isn’t declarative.

zbentley · 2026-03-24T21:46:39 1774388799

This is right. It’s not about scoping auth, it’s about preventing secret misuse/exfil.

(Moved from wrong sub)

JambalayaJimbo · 2026-03-24T21:55:05 1774389305

The MCP implementation is itself an agent right? Is that not just pushing the problem somewhere else?

Also, I run programs on my machine with a different privilege level than myself all the time. Why can’t an agent do that?

conception · 2026-03-24T22:20:21 1774390821

No, mcp just is a server that returns prompts to the llm. The server can be/do whatever. You can have an echo mcp that list echoes back whatever you send it.

gavmor · 2026-03-25T15:06:15 1774451175

Typically, no; an MCP is a deterministic program with SSE protocols.

simonw · 2026-03-24T22:10:10 1774390210

I define the agent as the harness that runs the LLM in a loop calling tools. The MCI implementation is one of those tools. I wouldn't call an MCP implementation an agent.

staticassertion · 2026-03-25T00:39:59 1774399199

Oh. Yeah, that's neat at least. I don't think it's a big deal but that's nice enough.

throwuxiytayq · 2026-03-24T22:34:01 1774391641

- MCPs can be long-running processes that have state, e.g., they can maintain a persistent connection with a server or local software.

- MCPs are trivial to write and maintain - at least in my experience and language of choice - and bash scripts are cursed. But I guess you can use a different scripting language.

- Agents can pollute their context by reading the script. I want to expose a black box that just works.

simonw · 2026-03-24T20:31:53 1774384313

Yeah, calling itself "the standard framework" doesn't feel right to me, https://github.com/modelcontextprotocol is the home of the actual standard and has a bunch of libraries for this, of which FastMCP is not one.

UPDATE: I was wrong about this, see comment reply. The python-sdk in https://github.com/modelcontextprotocol is a fork of FastMCP.

m11a · 2026-03-24T20:37:20 1774384640

If I recall correctly, the ‘official’ Python one is a fork of FastMCP v1 (which then removed the attribution, arguably in violation of the original software’s license)

simonw · 2026-03-24T21:46:24 1774388784

I stand corrected! https://github.com/modelcontextprotocol/python-sdk/blob/7ba4...

Merik · 2026-03-25T17:19:06 1774459146

There is a whole history with this, and i think its not appropriate or fair to malign the mcp python-sdk.

My read of what happened is that the author spiked an an initial the implementation of 'fastmcp' on Nov 30 2025, 5 days later, the author relicensed it to MIT, and donated it to the python sdk (10 days after anthropic announced MCP):

https://github.com/PrefectHQ/fastmcp/pull/54

It was incorporated on Dec 21 2024, and hardened through the efforts of one of the python-sdk maintainers.

The author seemingly abandoned the github project shortly after donating it to the python-sdk and marked it as unmaintained, and it remained so for several months (there are roughly zero commits between jan-april):

https://github.com/PrefectHQ/fastmcp/issues/96

He also apparently has made almost no other contributions to the mcp python-sdk:

https://github.com/modelcontextprotocol/python-sdk/commits?a...

Many contributors to the python sdk continued to iterate on the mcp server implementation using the name fastmcp ( since it had been donated to the project ) resulting in growing interest:

https://trends.google.com/explore?q=fastmcp%20&date=2024-12-...

Then around April 2025, the author likely noticing the growing interest and stickyness of the name, decided to write a new version and start using the name fastmcp again.

https://github.com/PrefectHQ/fastmcp/graphs/contributors?fro...

The author clearly made an attempt to promote his effort:

https://www.reddit.com/r/mcp/comments/1np6dwg/fastmcp_20_is_...

This resulted in a lot of confusion by users, which persists to this day. I only looked into this last year, because i was one of those users who was suddenly confused regarding the provenance of what i was actually using vs what i thought i was using; and as i looked into it i was suddenly seeing lots of questionable reddit comments pop up in subreddits i was reading, all evangelizing fastmcp 2.0 and using language that was contributing to the confusion.

The author's interest in monetizing the fastmcp github repo is understandable, and he and others have clearly put alot of effort into iterating in his SaaS onramp, but the confusion arises simply because the author wanted to capitalize on the success of mcp and on the popularity of the fastmcp name, the initial growth and popularity of which was primarily driven by the effort and support of contributors to the mcp python sdk .

simonw · 2026-03-24T20:09:18 1774382958

Suggestion for the maintainers: the comparison table currently lists some pretty old models, Qwen 2.5 14B and Mixtral 8x7B and Llama 3.3 70B.

A lot of people are reporting incredible results with the Qwen 3.5 MoE models on Apple hardware right now (streaming experts - see https://simonwillison.net/2026/Mar/24/streaming-experts/) - it would be great to get some of those models into that table.

Maybe the 1T parameter Kimi K2.5 too if you can get that to work, see https://twitter.com/seikixtc/status/2036246162936910322 and https://twitter.com/danpacary/status/2036480556045836603

tatef · 2026-03-24T21:29:38 1774387778

Thanks for sharing this! If you'd be interested in running the benchmark yourself with Hypura I'd happily merge into our stats. Otherwise will add to my todo list :)

Imustaskforhelp · 2026-03-24T20:29:36 1774384176

Simon, A little offtopic but it seems that your website isn't working.

> An error occurred in the application and your page could not be served. If you are the application owner, check your logs for details. You can do this from the Heroku CLI with the command

I get this error when I go to simonwillison.net

Any random blog/link works for example though: https://simonwillison.net/2026/Mar/19/openai-acquiring-astra...

(I checked your website because I wanted to see if you had written something about trivy/litellm as well, I highly recommend checking out what has happened within litellm space if possible as I would love to read your thoughts on it)

Have a nice day simon!

Edit: now the website works but I am not sure what had gone wrong previously, (an issue from heroku maybe?) as its working now

Edit-2: after the website working, I am able to see that you have already made a post about it.

abtinf · 2026-03-24T21:18:31 1774387111

The lack of a token rate metric for the kimi example is disappointing.

zozbot234 · 2026-03-24T22:11:06 1774390266

The latter link says they get ~1.7 tok/s which is quite impressive for a near-SOTA local model running on ordinary hardware.

simonw · 2026-03-24T16:19:42 1774369182

I wonder if this was timed to lineup with the MacBook Neo launch, which makes the idea of equipping your entire company with Mac laptops a lot more compelling from a cost perspective.

10729287 · 2026-03-24T16:37:27 1774370247

There’s a grey one. So obviously, it was timed.

simonw · 2026-03-24T15:32:13 1774366333

Thanks for this, I've added that to my write-up of the project here: https://simonwillison.net/2026/Mar/20/turbo-pascal/#hallucin...

alberto-m · 2026-03-25T09:30:49 1774431049

> Because it's amusing to loop this kind of criticism through a model

Maybe it could become a general pattern, to have an agent whose task is just to deny the output validity. GANs are a very successful technique, perhaps it could work for language models too.

simonw · 2026-03-24T00:25:29 1774311929

A tiny bit, but it never really appealed to me because I've never been heavily into the API-only version of web development - I still like building things that are mostly Jinja templates and HTML forms with a sprinkle of JavaScript.

My JSON API needs are simple enough that default Starlette handles them well.

I'm beginning to come round to the benefits of OpenAPI now which seems like a big note in FastAPI's favor, so maybe I'll give it more of a shot.

indigodaddy · 2026-03-24T15:45:51 1774367151

Just to be clear, I’m referring to:

https://fastht.ml/

simonw · 2026-03-24T16:23:06 1774369386

Or sorry I misread as FastAPI.

I'm too much of an HTML and JavaScript nerd to get excited about tools that let me write my HTML in Python.

simonw · 2026-03-23T21:42:02 1774302122

These feel like the kind of things I'd like to use once only, not permanently install into my Claude setup.

As such I'm more likely to just copy and paste markdown from this repo into a fresh Claude session - or tell it the raw GitHub URL and have Claude fetch it and run with it for the duration of that chat.

simonw · 2026-03-23T19:23:23 1774293803

Tobi from Shopify used a variant of autoresearch to optimize the Liquid template engine, and found a 53% speedup after ~120 experiments: https://github.com/Shopify/liquid/pull/2056

I wrote up some more notes on that here: https://simonwillison.net/2026/Mar/13/liquid/

Denzel · 2026-03-23T19:50:56 1774295456

How much did this cost? Has there ever been an engineering focus on performance for liquid?

It’s certainly cool, but the optimizations are so basic that I’d expect a performance engineer to find these within a day or two with some flame graphs and profiling.

simonw · 2026-03-23T19:59:24 1774295964

He used Pi as the harness but didn't say which underlying model. My stab-in-the-air guess would be no more than a few hundred dollars in token spend (for 120 experiments run over a few days assuming Claude Opus 4.6 used without the benefits of the Claude Max plan.)

So cheaper than a performance engineer for a day or two... but the Shopify CEO's own time is likely a whole lot more expensive than a regular engineer!

simonw · 2026-03-23T16:55:31 1774284931

That's not been my experience at all. The default response to open source code is stone cold silence - getting any feedback at all takes real effort.

Those PyPI download numbers are one of the most useful hints as to whether my stuff is being used by anyone.