We've been working on exactly this at Islo. Zero-setup microVM sandboxes with isolated networking by default, plus an approval workflow layer so agents can request capabilities and humans approve/deny in real-time.
The credential problem is handled through proxy middleware - agents never see real tokens, requests get routed through policy-checked proxies that inject credentials only for approved operations.
The paper nails it - we're giving agents capabilities before we have infra to contain them. The answer isn't better prompts. It's treating agent execution like untrusted code: sandboxed VMs, explicit capability grants, network isolation, approval workflows for production actions.
Prompt guardrails are theater - they work until they don't. We ended up building sandboxed execution for each agent action. Agent proposes what it wants to do, but execution happens in an isolated microVM with explicit capability boundaries. Database writes require a separate approval step architecturally separate from the LLM context.
Worth looking at islo.dev if you want the sandboxing piece without building it yourself.
Sandboxed execution is solid for isolation — separating proposal from execution is the right architecture. The piece we kept hitting was the policy layer on top: who defines what the agent is allowed to propose in the first place, and how do you update those rules without a redeploy every time?
The pattern only works if the tool enforces the OTP - i.e. the CLI doesn't perform the dangerous action until it receives the OTP through a path the agent can't spoof. If the tool just returns "ask the user for OTP" and the agent relays that to the user and then passes whatever the user types back into the tool, the security is in the tool's implementation: it must verify the OTP (e.g. server-side or via a channel that bypasses the agent, as stavros described) and only then execute. The all-caps message is then UX for the human and a hint to the agent, not the actual gate. So the question "does it actually require an OTP?" is the right one: if the tool code doesn't block on a real OTP check, it's hope, not a security model.
The other approach is to not give the agent access to the thing that needs protecting. Run the agent in an isolated environment - sandbox, VM, separate machine - so it never has the ability to email-blast or nuke your files in the first place. Then you're not depending on the agent to obey the prompt or on the human to be present for every dangerous call. Human-in-the-loop (or OTP-in-the-loop) is a reasonable layer when the agent has broad access; isolation is the layer that makes the blast radius zero. We're building https://islo.dev for that: agents run in isolation, host is out of scope, so you can let them run without approval prompts and still sleep at night.
Thanks for the feedback! If we look at other products in similar categories, we can see that the first thing is the value proposition. You want to tell a story to your viewer, and they will "remember" the pain by themselves - no need to write it in our own words. Or at least that's what we believe.
I’m Adam, co-founder at Kypso. Today, we are super excited to launch Kypso, an AI copilot to manage and scale your teams’ operations across everyday tools
We’ve worked on it for the past few months in close collaboration with our beta testers, and now we feel ready to show it to all of you!
With Kypso, you can:
- End every important discussion with a decision
Say goodbye to those "What's the plan again?" moments. With Kypso in your corner, important chats don't just fizzle out – they end with a clear decision that everyone's on board with. No more missed chances or endless delays. Let's get those projects moving forward, crystal clear and totally in sync!
- Identify unclear scopes before they become a problem
We've all been there – projects getting all tangled up because the goals weren't clear from the get-go. No more project wilderness! Kypso's got your back, helping you spot those fuzzy scopes right from the start. Say goodbye to scope creep nightmares and project bumps. Keep things smooth, on point, and totally budget-friendly!
- Share progress updates on a defined schedule
Who doesn't love a good update? Kypso's got your back when it comes to keeping everyone in the loop. Whether it's daily, weekly, or somewhere in between, you set the rhythm. Transparency, trust, and high-fives all around – that's how we roll!
- Create your own powerful use case
Customize Kypso to fit your style, your flow, and your team's groove. Get ready to unlock the full potential of your teams' operations, all while using the tools you already love.
The credential problem is handled through proxy middleware - agents never see real tokens, requests get routed through policy-checked proxies that inject credentials only for approved operations.
Happy to share more: https://islo.dev
reply