evanklem2004's comments

evanklem2004 · 2026-04-27T16:11:52 1777306312

yeah went back and forth on exactly this trade-off, you're right that vertical can produce tests tailored to the impl. horizontal forces invariants up front but the failure mode flips: you're tailoring tests to the architecture you imagined before any feedback from working code. so it's invariants-vs-behaviors, both have a tailoring failure mode just on different axes. compromise i landed on: vertical + an explicit anti-tailoring grill check at each cycle. definitely gonna tweak with more as i keep refining.

dpark · 2026-04-27T20:03:25 1777320205

What if you don’t ask for code yet. Prompt only for tests with maybe a minimal interface context that tests can code against?

evanklem2004 · 2026-04-27T16:02:29 1777305749

1): you have things backwards, the EvanFlow is not something i came up with but rather something i discovered similar to the dao. i am named Evan after the EvanFlow not the other way around.

2): you're right and dmitry called this out below too. shipped a fix that puts REFACTOR per-cycle, instead of being a deferred "after all tests pass" step. the old step 4 was iterate-shaped not TDD-shaped.

collingreen · 2026-04-28T15:22:28 1777389748

> 1): you have things backwards, the EvanFlow is not something i came up with but rather something i discovered similar to the dao. i am named Evan after the EvanFlow not the other way around.

What does this mean?

evanklem2004 · 2026-04-28T21:34:47 1777412087

sit under the lotus tree and it will come to you

collingreen · 2026-05-01T02:12:25 1777601545

That's what I thought but I figured I'd give the benefit of asking first before I passed judgment on the snark.

evanklem2004 · 2026-04-27T15:53:27 1777305207

fair to call out but half true. i did send claude off to look up specific stats on failure modes (62% assertion correctness, etc), but the design decisions came from my own reading of anthropic's reports, the columbia daplab paper i cited, and a mix of matt pocock's lectures + my own anecdotal experience running this loop on real projects.

evanklem2004 · 2026-04-27T15:42:54 1777304574

mmm good point! just shipped a fix that puts RED → GREEN → REFACTOR per cycle with the fresh test as safety net just like beck intended. macro/cross-cycle refactor lives in iterate now as its own separate thing so the two don't conflate. thanks for the catch : )

evanklem2004 · 2026-04-27T15:39:23 1777304363

yeah that is a little confusing, tdd is actually a substep of execution. it was listed separately in the diagram because not every task uses TDD (config, generated types, etc. skip it), so the skill is invoked conditionally during execution rather than always. but the arrow notation made it look sequential when it's actually nested. updated the README diagram to show that. thanks for the nudge.

evanklem2004 · 2026-04-27T01:56:39 1777254999

Built this as an opinionated Claude Code development flow based on evidence based practices and what has been working for me while developing professional code.

EvanFlow is a single TDD-driven loop. Say "let's evanflow this" and it walks brainstorm → plan → execute → tdd → iterate → STOP. Real checkpoints at design and plan approval. Never auto-commits, never auto-stages, never proposes integration - every git op is your call.

The three things that actually changed how I work:

1. Vertical-slice TDD. One failing test → minimal impl → next test. Watch each test fail before writing the impl that passes it. (Sounds obvious. Almost no agent does it by default. ~62% of LLM-generated test assertions are wrong per HumanEval research, so testing TDD discipline matters more than the impl discipline.)

2. Embedded grilling at decision points. Before locking a plan: what breaks if a user does X? What's the rollback? What's explicitly out of scope? Catches design flaws while they're still cheap.

3. Iterate-until-clean (hard cap of 5 rounds). Re-read the diff against dead code, naming, the deletion test, assertion correctness, and a Five Failure Modes pass (hallucinated actions, scope creep, cascading errors, context loss, tool misuse). For UI: screenshot via headless Chromium.

For bigger plans with 3+ independent units sharing types, it forks into a parallel coder/overseer orchestration. Integration tests at touchpoints ARE the cohesion contract.

Three install paths: Claude Code plugin marketplace, npx skills add, manual copy. MIT.

girvo · 2026-04-27T05:34:05 1777268045

Please don’t post AI generated comments :(

Just write it yourself. I promise it’s worth it

deaux · 2026-04-27T09:44:11 1777283051

He's even being cheeky by intentionally replacing the em-dash by a regular dash, haha

girvo · 2026-04-27T10:53:26 1777287206

It's quite well done really, but the cadence...

No x. No y. No z. Just abc.

Its like nails on a chalkboard...

evanklem2004 · 2026-04-27T15:17:41 1777303061

sometimes you gotta hit em with the ol' linkedin one two hehe