OK, let’s survey how everybody is sandboxing their AI coding agents in early 2026.
What I’ve seen suggests the most common answers are (a) “containers” and (b) “YOLO!” (maybe adding, “Please play nice, agent.”).
One approach that I’m about to try is Sandvault [0] (macOS only), which uses the good old Unix user system together with some added precautions. Basically, give an agent its own unprivileged user account and interact with it via sudo, SSH, and shared directories.
I use KVM/QEMU on Linux. I have a set of scripts that I use to create a new directory with a VM project and that also installs a debian image for the VM. I have an ./pull_from_vm and ./push_to_vm that I use to pull and push the git code to and from the vm. As well as a ./claude to start claude on the vm and a ./emacs to initialize and start emacs on the vm after syncing my local .spacemacs directory to the vm (I like this because of customized emacs muscle memory and because I worry that emacs can execute arbitrary code if I use it to ssh to the VM client from my host).
I try not to run LLM's directly on my own host. The only exception I have is that I do use https://github.com/karthink/gptel on my own machine, because it is just too damn useful. I hope I don't self own myself with that someday.
I'm mainly addressing sandboxing by running stuff in Claude Code for web, at which point it's Anthropic's problem if they have a sandbox leak, not mine.
It helps that most of my projects are open source so I don't need to worry about prompt injection code stealing vulnerabilities. That way the worst that can happen would be an attack adding a vulnerability to my code that I don't spot when I review the PR.
And turning off outbound networking should protect against code stealing too... but I allow access to everything because I don't need to worry about code stealing and that way Claude can install things and run benchmarks and generally do all sorts of other useful bits and pieces.
Containers here, though I don't run Claude Code within containers, nor do I pass `--dangerously-skip-permissions`. Instead, I provide a way for agents to run commands within containers.
These containers only have the worker agent's workspace and some caching dirs (e.g. GOMODCACHE) mounted, and by default have `--network none` set. (Some commands, like `go mod download`, can be explicitly exempted to have network access.)
I also use per-skill hooks to enforce more filesystem isolation and check if an agent attempts to run e.g. `go build`, and tell it to run `aww exec go build` instead. (AWW is the name of the agent workflow system I've been developing over the past month—"Agent Workflow Wrangler.")
This feels like a pragmatic setup. I'm sure it's not riskless, but hopefully it does enough to mitigate the worst risks. I may yet go back to running Claude Code in a dedicated VM, along with the containerized commands, to add yet another layer of isolation.
The interesting thing in that thread is how many people have landed on isolation as a workaround while still lacking a real control plane on top of it. Containers reduce blast radius, but they don’t answer approvals, policy, or auditability. That’s the gap I keep seeing in these setups. I've found a software, called Daedalab, that instead of sandboxing AI puts deterministic control on agents actions.
The sandboxing options are set when you connect the MCP to the agent, not by the agent passing params about its own sandbox.
There’s a misconception about the right security boundary for agents. The agent code needs secrets (API keys, prompts, code) and the network (docs, other use cases). Wrapping the whole agent in a container puts secrets, network access, and arbitrary agent cli execution into the same host OS.
If you sandbox just the agent’s CLI access, then it’s can’t access its own API keys/code/host-OS/etc.
But I use that specifically to run 'user-emulation' stories where an agent starts in their own `~/` environment with my tarball at ~/Downloads/app.tar.gz, and has to find its way through the docs / code / cli's and report on the experience.
There's an intermediate step, which is to use a combination of claude code sandboxing (bubblewrap), plus some pre tool hooks to look for sketchy commands, but it's still interactive and probably not the right longterm approach.
What I’ve seen suggests the most common answers are (a) “containers” and (b) “YOLO!” (maybe adding, “Please play nice, agent.”).
One approach that I’m about to try is Sandvault [0] (macOS only), which uses the good old Unix user system together with some added precautions. Basically, give an agent its own unprivileged user account and interact with it via sudo, SSH, and shared directories.
0. https://github.com/webcoyote/sandvault