This is because for some reason all agentic systems think that slapping cron on it is enough, but that completely ignores decades of knowledge about prospective memory. Take a look at https://theredbeard.io/blog/the-missing-memory-type/ for a write-up on exactly that.
It’s a self fulfilling prophecy. They’re extremely expensive so they must be good so they must be worth it. And because at that level measurement is extremely subjective it’s mainly about the vibes.
A vibe? It’s completely obvious AI slop with no attempt to make it legible. They didn’t even prompt out the emdashes. For such a cool finding this is extremely disappointing.
It's a fair question. I've had problems with Gemini 3 due to rate limiting, and I've been working on this for a while now. I'm planning Gemini 3 for a follow up.
It’s not groundbreaking in a technological sense. The codebase is actually a bit of a monstrosity. But it removed guardrails that were artificially put on these LLMs which suddenly gave it an entire new dimension and the timing was right.
I built this because I was curious what Claude sends to the API, how subagents get work delegated and what contexts look like. Interesting to see how small part of the context the user interaction really is typically.