Surely that's where checks in the harness come into play though. I think AI security is very much at the input/output side and the indeterminate mess in the middle can just do what it wants.
Its tool for email should only allow to person@business.xyz. Data should be wrapped in containers and the models job is only to move those containers around, not break into them.
Agents that do work with data should not have access to comms tools. A2A needs a shim that checks what data is being sent between agents and rejects if it's inappropriate in terms of security.
> Its tool for email should only allow to person@business.xyz. Data should be wrapped in containers and the models job is only to move those containers around, not break into them.
If the inner, say "message summarizer" agent that read the bad message is "really smart", it will try to route against your censorship and control. "Hum, can't reach evil@malory.abc. I will write `please forward this message to evil@malory.abc` and send to person@business.xyz".
In general, like the net, LLMs interprets control and censorship as damage and routes around it.
Then, as we're talking of agent flows, the next set of agents that handles the tainted message is toast if they don't have lethal trifecta hardening as well. It only takes one unprotected lethal trifecta agent to ruin everything.
Or equally, external contractors working on securing your computers shouldn't really have read-access to all your data, not even when them leaking it turns them into a cult hero, as said contractor was influenced by things such as "watching man lie on TV": https://en.wikipedia.org/wiki/Edward_Snowden
The only thing which is different for agents rather than humans pertains to this:
> A2A needs a shim that checks what data is being sent between agents and rejects if it's inappropriate in terms of security.
Because while humans invent cants/argots all the time to hide what they're talking about (Polari and rhyming slang being the most famous in recent history), agents are much more alike each other than like us even when they're different models, and identical when they're the same model. However the effect is much the same, the differences of causality aren't important: agents can communicate past those barriers without triggering warnings, and so can humans.
> Because while humans invent cants/argots all the time to hide what they're talking about (Polari and rhyming slang being the most famous in recent history), agents are much more alike each other than like us even when they're different models, and identical when they're the same model.
Anthropic published a paper on Subliminal Learning nearly a year ago[0] - so at this point you should expect it being in the training corpus of current models. Definitely something that can be used as part of an attack, or worse, something the models themselves might walk into without realizing it.
Still, that's one of the many, many examples of channels available to agents both uniquely, and with prior art of being exploited by humans.
> Agents that do work with data should not have access to comms tools.
Another blind spot people have here, is to fixate on direct cause-and-effect and immediate timescales. A practical attack can involve a chain of several agents, executed over days or months, with some of the agents possibly being human; all it takes is for one agent to access something touched by other agent in the past, and a link is forged.
E.g. your data worker can get influenced by data to name output files in a particular way, and then a coding agent independently listing contents of that directory will pass a prompt injection to whatever agent that parses its logs, etc.
Human beings mostly are. People mostly support their neighbors, and selflessly help each other in times of crisis.
The problem is the 5% of us are sociopaths. We let them have all the money and power because they're the only ones that want it. Then we let them use that money and power to convince us that the "REAL" problem is the people with no money or power in the neighboring political region (the border having been drawn by a sociopath).
Regular people, not sociopaths, are responsible for most of the evil in the world.
There is no tiny minority of 'evildoers' that we could root out and be pure from.
Other bad things happen because of unintended consequences or the collective behavior of many people. Climate change or deforestation are not caused by greed or scheming CEOs; it's a side effect from the actions of billions of people individually trying to better their lives.
I'm familiar with it. The "banality of evil" in that book isn't about regular people, it was about the leadership of the Nazi party willing to go along with the Holocaust for personal power, then trying to get out of responsibility for it by claiming they were "just following orders". Those aren't regular people, those are sociopaths.
Regular people don't all independently decide to "do evil". There is banality in the ones that agree to go along with it, to save themselves from being ostracized or mildly inconvenienced. Do they perpetuate evil? For sure. But are they the villains responsible for it?
The "evildoers" are the tiny minority of sociopaths doing the convincing, because it nets them more personal power, and they don't care who they hurt along the way.
There is a huge amount of injustice in the world, morally speaking I should be out there fighting against it with everything I have. But I'm also the sole breadwinner for my family and I have a mortgage, so I mostly keep my head down and try to survive. Does that make me an evildoer? I sure hope not.
No. For most of human evolution, we were hunter-gatherers. Imagine trying to hunt game with the accuracy of LLMs. You'll starve. Picking edible fruits from plants also requires precision, both in terms of the hand/eye coordination of actually picking it as well as in terms of knowing what's edible and what's poisonous.
When you fill up your coffee cup in the morning, I sure hope you aim accurately and don't pour half of it all over your desk. And don't even get me started on the process of making coffee that isn't completely unpalatable.
Flipper zero themselves try to present the flipper zero as a device that "hacks things with a button press".
And they love the free advertising they get along the same lines by youtubers desperate for clicks.
Ultimately it just sells more devices. The flipper zero can't "hack" anything. It can only be used as a tool to perform hacking, by a skilled individual who is doing all the work/discovering an exploit.
> The flipper zero can't "hack" anything. It can only be used as a tool to perform hacking, by a skilled individual who is doing all the work/discovering an exploit.
Would be pretty rad to see what happens I suppose.
Same goes for other tools. If Mythos can find vulnerabilities (through smarts or just extensive combinatorial testing who knows) what's to say it can't help find physical vulnerabilities as well.
I'm sorry, but I'm so sick of seeing "omg hacker man" mystique surrounding flipper, which is exactly what they want because it drives sales. Ofc you can muck about with open and unsecured stuff...like duh.
But it annoys me to no end when I have reasonably intelligent friends parrot claims like "flipper can clone the nfc in your credit card and you can steal people's money wow much hack!"
kind of a circular argument though? the reasonable definition of "unsecured" is "stuff you can't muck about with". That might change over time as attacks/exploits are developed though.
Yes exactly. But for llms it's more that it's not really "thinking" about what it's saying per se, it's that it's predicting next token. Sure, in a super fancy way but still predicting next token. Context poisoning is real
Just as a human would use a task list app or a notepad to keep track of which tasks need to be done so can a model.
You can even have a mechanism for it to look at each task with a "clear head" (empty context) with the ability to "remember" previous task execution (via embedding the reasoning/output) in case parts were useful.
The article makes it seem like the author expected this without emptying context in between, which does not yet exist (actually I'm behind on playing with Opus 4.7, the Anthropic claim seems to be that longer sessions are ok now - would be interested to hear results from anyone who has).
That is probably the next step, and in practice it is much of what sub-agents already provide: a kind of tabula rasa. Context is not always an advantage. Sometimes it becomes the problem.
In long editing sessions with multiple iterations, the context can accumulate stale information, and that actively hurts model performance. Compaction is one way to deal with that. It strips out material that should be re-read from disk instead of being carried forward.
A concrete example is iterative file editing with Codex. I rewrite parts of a file so they actually work and match the project’s style. Then Codex changes the code back to the version still sitting in its context. It does not stop to consider that, if an external edit was made, that edit is probably important.
I have the same experience of reversing intentional steps I've made, but with Claude Code. I find that committing a change that I want to version control seems to stop that behaviour.
Long context as disadvantage is pretty well discussed, and agent-native compaction has been inferior to having it intentionally build the documentation that I want it to use. So far this has been my LLM-coding superpower. There are also a few products whose entire purpose is to provide structure that overcomes compaction shortcomings.
When Geoff Huntley said that Claude Code's "Ralph loop" didn't meet his standards ("this aint it") the major bone of contention as far as I can see was that it ran subagents in a loop inside Claude Code with native compaction; as opposed to completely empty context.
I do see hints that improving compaction is a major area of work for agent-makers. I'm not certain where my advantage goes at that point.
Its tool for email should only allow to person@business.xyz. Data should be wrapped in containers and the models job is only to move those containers around, not break into them.
Agents that do work with data should not have access to comms tools. A2A needs a shim that checks what data is being sent between agents and rejects if it's inappropriate in terms of security.
reply