Hacker Newsnew | past | comments | ask | show | jobs | submit | mschwarz's commentslogin

Day to day I run a dev pod (implementor + qa + frontend design), a review pod doing adversarial review with one Claude and one Codex, and an orchestrator pair. I think the best flex here to illustrate real work being done is so far the longest single rig I've kept running continuously was about 4 days, so that means a large implementation spec being executed with test driven dev approach from obra superpowers + independent deep contextual code reviews at milestones (my own skill pack) + automated vercel agent-browser testing along the way. So currently it's a closed sdlc loop that is only limited by the amount of work I gave it. The "babysitting agents" part moves me up a layer to watching for spec drift and handling weird edge cases that come up. So its not set and forget but you can definitely have it work on something real overnight to get that 'my agents shipped code for while I slept' kind of outcome. I watch a demo video in the morning to see what they built, then do my own code review spot checks of pr's.

The original motivation for making OpenRig is this pattern works well I've been doing this for months now, and I'm sure many people have also gotten something like to work, but the topology is fragile. Like the sessions die, your laptop needs a reboot, you lose the setup you built up that took weeks to perfect. OpenRig makes the topology itself a first-class thing, like a docker-compose but for the topology of claude codes / codex on your machine and all their specific context and configs you fine-tuned.

Regarding supervision - that is the key question for sure - I can't really babysit more than 4-5 agents without feeling like I've lost the plot a bit. So the demo pod in the onboarding includes an example of a pattern I use where there are 2 orchestrators in a "high availability" pair, so I just really interact with 1 agent for the workstream - the orch-lead. The peer is there to monitor and absorb the lead's mental model in realtime, and can take over for the rig if the lead's context limit hits the wall, or something else goes wrong.


What use cases did you find this approach works for, and what doesn't? Any observations on what topologies work better?

I tried doing the same for the cases of maintaining OSS projects. So far, best I could manage is to get the agents to autonomously do %80~ of the work. But then, I have to review manually each potential PR, and almost in every case to further work with an agent providing it with guidance live to fix it. This takes about as much time as without the swarm. So far I found that the usefulness of the swarm is mostly for the initial scouting, to map out what work needs to be done in first place, and store it in a nice JSON file.

From my observations, all it takes is one mistake for an agent to make, from there, the architecture just snowballs into chaos as the future work builds on top of incorrect initial approach.


Yeah I can definitely relate with the snowballing. I am mostly building web apps (python/typescript) so ymmv. Have you tried to pair codex with claude? This is like the gateway drug for doing agent topologies. This is definitely worth trying. Claude is better at understanding your intent, but at the expense of it makes lots of mistakes. Codex makes less mistakes but at the expense of over-engineering. Together they are not perfect but significantly more accurate. They complement each other well. So Codex reviews claude, using TDD is even better because codex will gate each change claude makes. You can apply this pattern to implementation, reviews, PM, even research, etc. OpenRig has a spec called implementation-pair which lets you try this pretty easily. There is another one called adversarial-review which is the same topology just different starter context / instructions to make them less constructive, more combative. You'll get a feel for which one you need for a task pretty quick. But lots of people have made this pattern into skills. I think OpenRig is probably the easiest happy path to try it because the 2 agents can literally type into each others terminals using "rig send" and "rig capture" and see each other screens using tmux, as if you were the one typing the commands. But now you just sit back and watch them find and fix bugs. You dont need OpenRig to do this, just tmux, but raw tmux is a little fiddly to get working which is why i made the rig send command as a tmux wrapper.


OP here. Happy to answer questions or go deep on specifics.

Some topics I've been asked about: tmux as a transport primitive (actually a pretty nice unlock), how snapshot/restore actually works in practice, hows this different from a harness framework, why I didn't just build this into Claude Code, why I think the topology layer needs to stay independent from any one vendor's platform, etc.

(14 years lurking on HN. First post.)


Did you try llamaparse from Llamaindex? It’s a cloud service with a free tier. Recently switched to it from unstructured.io and it works great with the kinds of images and table graphics I feed it.


Why does the author claim that Adept was acquired by Amazon? The linked article says they hired away the CEO and key staff.


It was a weirdly structured deal that in effect was an acquisition. The investors were paid off.


I want this to be legit, but I fear we’ve entered the ICO phase of the AI boom


He said 4 x 5000 daily, which is considered a large dose


They didn't use an x, they used a hyphen. > I began megadosing 4-5000 IUs daily

That reads as "four to five thousand IUs."


Can you elaborate on the difference between the terms “late untreated borreliosis” and “chronic Lyme disease”? Borreliosis is just another name for Lyme disease. What distinction are you so sure about that I’m missing?


Chronic lyme is a vague term that usually refers to people who have already been treated, but continue to have symptoms, and may not test positive. It's a grabbag of stuff.


I have heard this claim before but I couldn’t follow the logic, genuinely would like to understand your perspective.

So “late untreated borreliosis” is a “real thing” but if someone gets borreliosis, gets treated, yet their symptoms persist (this scenario is what people usually mean by the term chronic Lyme) then that is NOT a “real thing”?

Does this mean that treatment is 100% effective or that if it didn’t work, then it wasn’t borreliosis to begin with?


Don't expect a good answer. That brand of skepticism is performative rationality devoid of actual critical thinking.


Reading this was very cathartic, I was nodding along and laughing as I had the exact same journey of WTF. And all along I just assumed I was 100% of the problem.

Hearing this perspective helps put my frustration in context. I need to lower my expectations and just get used to its quirks. Despite its issues I've had a ton of fun building with langchain and will keep using it.


Amen! I had the exact same reaction. Like the author I eventually threw my hands in the air and started rolling my own solution that is already getting most of what I was interested in done without the hassle.


It’s great to see more companies focusing on inflammation. Is it possible to create a device like a continuous glucose monitor but for inflammation?


Th glucose monitor warns you to take Insulin but, The question for me is what action should be taken after high inflammation is detected? It seems like once an alert goes off, it's just a warning to go see a doctor.

The treatments for inflammation, and the indication of what specifically is causing it, seem to be rather limited to me beyond taking antibiotics or anti inflammatory meds.

Ibuprofen is fairly flawed as a regular treatment for many because of the ulcer risks...


There are many suggested treatments for inflammation, OP mentions “ginger”, my guess is that like everything else with the body certain treatments work for some people and not others. Like how CGM’s are used now not just for diabetes but for a tighter feedback loop on which foods, etc spike insulin, which allows a person to iterate their lifestyle faster. It’s the same with inflammation- If you have chronic inflammation and are trying to reduce it with diet, exercise, medication, naturopathic treatments, a tighter feedback loop would be a game changer, assuming it was possible. Perhaps CRP does not respond as quick to changes in the body as glucose, and measuring it all the time isn’t gaining much.


Great comment! We believe there is a future where continuous inflammation testing will be possible. Here's a team already working on building it: https://onlinelibrary.wiley.com/doi/pdf/10.1002/admt.2021013...

To your point, we see it as a potentially tighter feedback loop for seeing the impact of lifestyle changes.


Agreed, that only highlights the need to have a network of reliable places someone can go in order to have personally tailored care.

Just finding a doctor that can see you immediately in the US without racking up expensive emergency room fees is darn near impossible now in the US... It always comes back to our ritually broken health care system. :(


One interesting thing about going to continuous (or near-continuous) testing is you can get much narrower error bars on your baseline.

If you test once a month, how do you know if you accidentally tested during a transient spike?

If you test continuously, you can roll up those measurements to get things like p95, stddev, whatever.

So it’s useful even if you don’t want to respond in minutes to a spike. (You probably don’t need per-second readings for this, hourly would be enough.)

I don’t know if transient spikes are considered a risk factor, but you also get more chance to resolve those too.


Lifestyle changes including those related to nutrition, exercise, stress, and sleep can all improve inflammation levels. See our blog for a list of lifestyle interventions associated with decreased inflammation levels: https://www.begolden.online/post/lifestyle-interventions-ass...

Each person will respond differently so we suggest testing what works best for you.


We think so! We have started to see teams working on this and couldn't be more excited: https://onlinelibrary.wiley.com/doi/pdf/10.1002/admt.2021013...


Reminds me of Forward Error Correction [1], a technique used by Satellite providers and even WAN optimization vendors like Silver Peak to "erase" packet loss events by injecting parity packets into the flow, which can be used to rebuild lost packets at the receiving end if needed. This prevents TCP synchronization, aka the throughput see-saw described in the article. This problem isn't limited to high latency / satellite links but exists on any path with packet loss, like your internet connection, or even MPLS.

[1] http://en.m.wikipedia.org/wiki/Forward_error_correction


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: