Modelled on a real /develop

A spec goes in. An audited, committed branch comes out.

Fire one prompt and walk away. No hands on the wheel between plan, build, audit, and sign-off: /develop hands back a branch that already cleared your gates. Here's the machinery, and how to stand up your own.

See how it works → Use it on your repo →

This page is the concept: what orchestrated /develop is and why. To run it on your own project, head to Use the plugin.

Skill: a playbook Agent: a worker Nest: depth Fork: breadth Gate: evidence

The shift · manual vs orchestrated

Same feature: you babysit it, or you don't

Build it by hand and you're on the wheel the whole time: re-prompting at every handoff, eyeballing "done," looping back each time CI rejects. Hand the same spec to an orchestrator and you kick it off and walk away: it walks a durable plan, clears mechanical gates, resumes itself if it crashes, and hands back an audited branch.

By hand before

You run the session. Drive every handoff, follow it to the end.

You babysit every seam: re-prompt at each handoff, re-explain what the last step already knew.
Context leaks at every hop; controls pass on judgement and get waved through under pressure.
Reviewers re-find the same defect classes every PR.

Orchestrated after

Fire-and-forget. Hand it a spec, walk away; it resumes itself.

State lives in the plan file, the system of record. A crash resumes by re-reading it, with no one watching.
Controls clear on evidence: a command ran and produced a result, not a nod.
Every escaped finding books a new control at the right layer: a hook, gate, rule, agent, or plan. The line tightens itself.

Why this design won · the evidence, honestly

Four methods, one scorecard

We didn't land here by taste. Each generation closed a specific gap in the last. Vanilla, plus three generations of the skill (ad-hoc, dynamic workflow, and today's plan-walking orchestrator): four generations of the approach, with representative token and agent counts from a real project's runs (anonymized).

measured · effective tokens per run, representative figures from a real project's runs (anonymized)

These are effective tokens per run: the raw total down-weighted by how each token actually bills. Raw totals look alarming (the current method's is ~570M) but cache reads are ~95% of that volume and bill at ~1/10 the input rate, so counting them at face value over-states real spend by roughly 5-6× (more for single-context methods, less for ones that fan out fresh agents). Weighted for that, a run is closer to ~75-125M effective tokens, still large but the honest figure. (Output is under 1% of the raw total, which is why an earlier output-only ~3M number looked absurd against a real run.) These are representative figures from a real project's runs (anonymized). Effective medians sit close: Skill Only ~75M · Mechanical ~110M · Dynamic Workflow ~120M · Vanilla ~125M. The spread is the story: Dynamic Workflow's fan-out gives it the worst tail (~310M effective on one run), Skill Only the widest range, Vanilla the heaviest steady median. Spend isn't what separates these methods; reliability and predictability are.

How to read this: Accuracy and Quality are directional: reasoned from architecture, since no ground-truth scores exist. Wall-clock, Tokens/run, and agent counts are grounded in real measurements from a real project's runs (anonymized), across 10-27 sampled runs per method. Tokens/run is effective (raw total, cache reads weighted at their ~1/10 billing rate): the fair per-run figure, not the scary raw volume. Wall-clock is relative: a multiplier against the leanest method, not absolute hours, because * it's session lifetime (a run resumed later for PR-review/CI reads far longer than its core loop), so treat it as directional ordering, not a clock. The most thorough method runs longest by design: it checks the most. Agent counts are the empirical run total; a few multi-day "epic" runs were excluded as outliers. A controlled bake-off would add measured accuracy and apples-to-apples per-run numbers.

Watch it run · animated · true to the skill

The orchestrator walks a tree

The walk is a tree, not a line. Control runs left → right along the phases, dipping down into each phase's agents and back up before the next: the "back and forth," without crossing wires. Pick a size; the tree grows real nesting, forking, and phases as scope rises. All of it mirrors what /develop does: lean by default, with forks and ladder-climbs only where they fire.

Why it compounds

The residual-feedback flywheel

Most pipelines run at flat quality. This one compounds: every run's leftovers book new controls at the right layer (a hook, gate, rule, agent, or plan), the pipeline strengthens, and audit/tidy load amortises toward an irreducible floor. The wheel is that loop: watch the controls accrue and preventable escapes run to zero.

A run settles; the audit + tidy tail logs every residual finding.
Each is marked preventable (some check could have required it) or irreducible (the floor: "compiles but subtly wrong").
For each preventable one, route it to the cheapest lever that catches it earliest: a hook, gate, rule, agent, or plan step, not always a plan step.
Next run, that lever fires at the right layer, and that defect class stops escaping.

Your turn

How to build your own

Strip the project specifics and this is a handful of reusable parts. Start small (one generalist executor, a plan file, real controls), then add specialists, forks, and feedback where the pain shows.

You don't have to hand-roll it. The develop plugin's /develop:init discovers your repo's real gates and scaffolds exactly this shape, fitted to your stack, not transplanted. The parts below are what it builds, so you can understand it, own it, and grow it. Use the plugin on your project →

Three primitives, and that's it

Everything here is one of three things. Learn the trio; the rest is composition.

skill Playbook

A reusable procedure: what runs, in what order. Called by name (/develop, /plan-work, /tidy). It directs; it rarely lifts.

e.g. /develop, /plan-work, /audit-feature

agent Worker

A desk with its own fresh context and a narrow remit. Hand it a brief, it does one job, returns a short result. One desk filling its context never touches the next.

e.g. backend-scaffolder, feature-validator, reviewers

orchestrator Conductor

The thin loop that walks the plan and dispatches the desks. It holds only phase status, never the heavy reading, so it clears dozens of agents without filling up.

e.g. the /develop main loop walking PN nodes

The unlock: forking and nesting

How one orchestrator becomes dozens of coordinated desks without filling its own context. Nesting buys depth; forking buys breadth.

Together, that's the whole trick. The orchestrator nests one executor per phase (depth, clean context per slice); audit / plan / tidy fork independent reads that converge (breadth, verification). No agent ever holds the whole book, which is why the line scales past any single context.

Patterns worth stealing

Seven portable moves: each a principle the pipeline leans on, most mapping straight to Anthropic's Building Effective Agents (tagged ≈). The takeaway on each is the part that travels to any stack.

The cast: every agent and skill

The moving parts of /develop. Click a card for what it does and the exact file you'd write.

The files you'd write

The real shape of each file type, frontmatter and all. Switch tabs to compare.