Hacker Newsnew | past | comments | ask | show | jobs | submit | meisnerd's commentslogin

You're right that this is fundamentally a signal flow problem. The whole architecture is designed around that — tasks, inbox messages, decisions, and activity log are all just JSON files that any agent can read/write. The daemon polls for new signals and dispatches work automatically.

On email specifically: great callout about structured vs raw MIME. My current thinking is a parsed webhook approach — sender, subject, body, thread ID as structured fields. Raw MIME would be a nightmare for agents to parse and waste tokens on headers nobody cares about. The GitHub Issues integration is closer to shipping (it's one of our open issues now), email is further out but the inbox architecture already supports it — just needs an ingest layer.

Task provenance is an interesting gap you've identified. Right now tasks have createdAt, assignedTo, and the activity log tracks every state change, but there's no explicit "originated from" field linking back to an email or Slack message. That would be a clean addition to the schema — something like source: { type: "email" | "github" | "manual", ref: "..." }. Appreciate the specific feedback.


100% agree on making debugging obvious. That's exactly why everything is local JSON files — you can literally cat tasks.json | jq '.tasks[] | select(.kanban=="in-progress")' and see exactly what's happening. No database queries, no admin panels, just files.

The activity log captures every state change with timestamps and actor IDs, so when something breaks you can trace the exact sequence. And since agents communicate through inbox.json, there's a full message trail — who delegated what, what reports came back, what decisions were requested.

Curious about your markdown + git approach — do you get merge conflicts when multiple agents write to the same state files simultaneously? That was the main reason I went with JSON + async-mutex instead. Git history gives you great auditability but concurrent writes to the same file need some kind of coordination layer.


It's modular by design, but the UI is the primary interface — no standalone CLI yet. On the opinionated/piecemeal question: the data layer is intentionally unopinionated. Everything is plain JSON files — tasks.json, inbox.json, decisions.json, etc. Any tool that can read/write local files can participate. I've had Claude Code, Cursor, and custom scripts all interacting with the same task queue without touching the web UI. So at the data level, it's fully piecemeal. The UI is more opinionated — it assumes an Eisenhower matrix + kanban + inbox workflow. But you don't have to use all of it. Some people just use the kanban and ignore the priority matrix entirely. A CLI is an interesting idea and something I've thought about. The API is already there (token-optimized REST endpoints for everything), so wrapping it in a mc tasks list --assignedTo=developer --kanban=in-progress style CLI would be straightforward. If that's something you'd use, I'd bump it up the roadmap — would you open an issue? Haven't looked at beads specifically — could you share more about what you mean by tracking state with beads? Curious if it maps to something Mission Control already does (the checkpoint/snapshot system saves full workspace state and lets you roll back) or if it's a different paradigm entirely.

"Orchestrating amnesiacs" — that's a good way to put it, and it's a fair critique of any system spawning claude -p sessions.

Mission Control's mitigation is treating the shared data layer as the durable state rather than the session itself. Each agent session is ephemeral, but it reads from and writes to persistent JSON files — tasks, inbox, decisions, activity log — so the accumulated state of all previous sessions is available to the next one. There's also a compressed context snapshot (ai-context.md, ~650 tokens) that gets regenerated and gives each new session situational awareness: what's done, what's in progress, what's blocked, what decisions are pending.

So agents aren't fully amnesiac — they inherit the project state — but you're right that they can't replay their own previous reasoning. If an agent tried approach A in session 1 and it failed, session 2 only knows that if someone (or the agent itself) wrote it down in the task notes or activity log. That's a manual bridge, not a structural one.

Named, resumable sessions with full transcript replay would be a better foundation. That's more of a Claude Code platform feature than something I can build on top — but if claude -p ever supports session IDs or transcript export, wiring that into Mission Control's task history would be a natural fit.

Curious what you're using for the persistent session layer — custom wrapper around the API, or something off-the-shelf?


The compressed context snapshot is a smart workaround - basically hand-rolling what session continuity would give you for free.

For the persistent layer, we're using the API directly with conversation IDs stored our side, then replaying relevant message windows rather than full transcripts. Full replay gets expensive fast, so we filter to decision points and errors. Similar tradeoffs to your activity log, just at the API level rather than filesystem.


Yes. When you're making the task (or asking claude code to populate the mission/subtasks), you can mention what folder to output to (like in the task description or mission). So far I havent specified, and it has just been dropping research reports in a /research folder that it made; a /projects folder for project outputs, etc.

This is exactly right — and it's why acceptanceCriteria is a first-class field in the task schema, not just a description. Every task has an explicit acceptanceCriteria: string[] array that defines what "done" actually means:

acceptanceCriteria: [ "All tests pass (pnpm test)", "No TypeScript errors (pnpm tsc --noEmit)", "File written to src/components/NewFeature.tsx", "Completion report posted to inbox" ]

When a task launches, those criteria get injected into the agent's prompt context alongside the task description, subtasks, and agent instructions. The agent sees exactly what "done" means before it starts working.

You're also right that the deeper problem is "successfully completed the wrong thing." Retry logic assumes failure is obvious (exit code ≠ 0), but a task that silently drifts is harder to catch. The /ship-feature command enforces a verification step — runs tests, lints, and typechecks before marking anything complete — which catches a lot of the "it wrote code but nothing actually works" cases.

That said, there's still a gap between "tests pass" and "this actually does what I asked." That's where the human-in-the-loop decisions queue helps — agents can post a decision request like "I implemented X, but the acceptance criteria mention Y. Should I continue?" — but making agents reliably self-evaluate against criteria is still an open problem.


Update: just shipped the loop detection + decision escalation I mentioned. Here's how it works now: When you run a "continuous mission" (one-click to execute an entire project), the daemon chains tasks automatically — as each finishes, the next batch dispatches based on dependency order. If an agent fails the same task 3 times in a row, loop detection kicks in and auto-creates a decision in the decisions queue with context about what failed and options (retry with a different approach, skip it, or stop the mission). The human gets an inbox notification and can answer from the UI. It also posts a mission completion report to the inbox when everything finishes (or stalls) — task counts, file paths from the work, and a nudge to check the status board for anything left over. Still not full self-evaluation in the "did I actually make progress?" sense — that's the next frontier. But the mechanical escalation path is wired end-to-end now. Code's on GitHub if you want to poke at it: https://github.com/MeisnerDan/mission-control

Thanks, man! A landing page is definitely on the list — right now I'm focused on getting the core solid first, but you're right that it would help with discoverability. For now the GitHub README is doing double duty as the landing page.

And you don't actually need Claude Code to use it — Mission Control works with any AI agent that can read/write local files. The data layer is just JSON, and the API is token-optimized (~50 tokens per request vs ~5,400 unfiltered, about a 94% reduction) so it's lightweight for any agent to consume. The Eisenhower matrix, Kanban, goal hierarchy, and brain dump all work standalone too. The daemon and agent orchestration just layer on top when you're ready for it.


thanks bro for the mission. in deed building the core is more important. give value to initial users and then focus on making it mainstream. good luck

Great question — and I think you're right that self-evaluation is the harder problem. Right now, Mission Control's daemon handles the mechanical side: exponential backoff retries (configurable), maxTurns and timeout limits per session to prevent runaway agents, and permanent failure after exhausting retries. But it's blunt. That said, what MC does have is the plumbing for human escalation — an inbox system where agents can post decision requests, and a decisions queue where questions get surfaced to the human. But that's not wired into the daemon's failure path yet, which is an obvious next step. I think the real answer here is some kind of evaluation step between retries — "did this attempt make meaningful progress, or am I spinning?" — probably by having the agent review its own output against acceptance criteria before deciding to retry. That's on my radar but haven't built it yet. Curious how you handle it with your STATE.md approach — do you have the agent evaluate its own progress, or do you review manually?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: