What is duhwave
A single-agent harness is one loop: one prompt, one model, one tool registry. It exits when the work is done. duhwave keeps the loop alive past any single CLI invocation, accepts work from external triggers (webhooks, file watches, cron, MCP push), and lets a coordinator delegate to specialised workers via a shared substrate of variable handles instead of prose summaries.
The architecture composes from six primitives: an RLM substrate (one Python REPL subprocess per session, large inputs bound as named variables), a persistent Task lifecycle (forward-only state machine, three execution surfaces, JSONL on disk), a coordinator-as-prompt-role (one engine, no subclasses, tool filtering by allowlist), recursive cross-agent links (workers see scoped handles, results bind back as new handles), an event-ingress layer (five listeners, one immutable Trigger type, one append-only TriggerLog), and a topology DSL (one TOML file, packed and signed as a .duhwave bundle).
Five ADRs
duhwave's design lives in five Architecture Decision Records. Each ADR ships with implementation under duh/duhwave/, unit and integration tests under tests/, and at least one runnable demo under examples/duhwave/.
| # | Title | Implementation | Tests | Demo |
|---|---|---|---|---|
| 028 | RLM context engine | duh/duhwave/rlm/ |
tests/duhwave/rlm/ |
examples/duhwave/01_rlm_demo.py |
| 029 | Recursive cross-agent links | duh/duhwave/coordinator/spawn.py |
tests/duhwave/coordinator/ |
examples/duhwave/02_swarm_demo.py |
| 030 | Persistent Task lifecycle | duh/duhwave/task/ |
tests/duhwave/task/ |
examples/duhwave/repo_triage/main.py |
| 031 | Coordinator role + event ingress | duh/duhwave/coordinator/ Β· duh/duhwave/ingress/ |
tests/duhwave/ingress/ |
examples/duhwave/03_event_driven.py |
| 032 | Topology DSL + bundles + control plane | duh/duhwave/spec/ Β· duh/duhwave/bundle/ Β· duh/duhwave/cli/ |
tests/duhwave/spec/ Β· tests/duhwave/bundle/ |
examples/duhwave/04_topology_bundle.py |
Quick start
Install D.U.H., write a minimal swarm.toml, install the bundle, start the host daemon. The daemon stays alive until you stop it; triggers spawn Tasks; Tasks run agents; agents bind results back as REPL handles.
# 1. Install D.U.H. (duhwave ships in the same package) pip install duh-cli # 2. Author a topology directory: foo.duhwave/{manifest.toml, swarm.toml, permissions.toml} # See examples/duhwave/repo_triage/ for the canonical layout. # 3. Install the bundle into ~/.duh/waves/ duh wave install ./foo.duhwave # 4. Start the host daemon duh wave start # 5. Inspect what is running duh wave ls duh wave inspect foo duh wave logs --follow foo # 6. Stop the daemon when done duh wave stop
A minimal swarm.toml:
[swarm] name = "hello-wave" version = "0.1.0" format_version = 1 [[agents]] id = "coordinator" role = "coordinator" model = "anthropic/claude-haiku-4-5" tools = ["Spawn", "SendMessage", "Stop", "Peek", "Search", "Slice"] system_prompt = "prompts/coordinator.md" [[agents]] id = "worker" role = "worker" model = "anthropic/claude-haiku-4-5" tools = ["Read", "Grep", "Peek", "Search", "Slice"] system_prompt = "prompts/worker.md" [[triggers]] kind = "manual" source = "hello" target_agent_id = "coordinator" [[edges]] from_agent_id = "coordinator" to_agent_id = "worker" kind = "spawn" [budget] max_concurrent_tasks = 3 max_usd_per_day = 1.00
10 runnable demos
Every duhwave primitive is exercised by a runnable demo under examples/duhwave/. The parity_hermes/ set walks through five harness-parity demos; parity_claw/ walks through four control-plane demos. The headline showpieces β repo_triage/, agile_team/, real_e2e/, telegram_assistant/ β are full end-to-end swarms.
| # | Demo | What it shows |
|---|---|---|
| 1 | 01_rlm_demo.py |
RLM REPL substrate: bind a 235K-char repo, peek + search by handle, never inline the bytes |
| 2 | 02_swarm_demo.py |
Coordinator + two workers via Spawn; result bind-back as a new handle |
| 3 | 03_event_driven.py |
All five ingress listeners: webhook, filewatch, cron, MCP push, manual seam |
| 4 | 04_topology_bundle.py |
Parse swarm.toml, pack a deterministic .duhwave ZIP, install with TOFU trust |
| 5 | repo_triage/main.py |
~400 LOC end-to-end: bundle build β install β daemon β trigger β orchestrate β inspect β stop |
| 6 | agile_team/main.py |
5-stage pipeline (PM β Architect β Engineer β Tester β Reviewer); stub or real OpenAI runner |
| 7 | real_e2e/main.py |
Real daemon subprocess + real HTTP webhook + real OpenAI completion + real outbox file |
| 8 | telegram_assistant/main.py |
Persistent assistant with three flows: inbound webhook, scheduled cron, on-demand manual |
| 9 | parity_hermes/run_all.py |
Five demos: multimode adapters, tool-arg repair, parallel dispatch, RLM-replaces-compaction, shared budget |
| 10 | parity_claw/run_all.py |
Four demos: four ingress channels, persistent state, concurrent ingress, per-channel isolation |
Real-OpenAI benchmark
A single CLI invocation drives a 5-stage agile-team pipeline (PM β Architect β Engineer β Tester β Reviewer) against real OpenAI models. Each stage spawns a worker, reads exposed handles from the coordinator's RLM REPL, and binds its result back. The benchmark runs pytest against the produced implementation; the pass rate is the headline number.
| Metric | gpt-4o-mini | gpt-4o |
|---|---|---|
| Stages completed | 5 / 5 | 5 / 5 |
| Wall (single-threaded) | 35.5 s | 29.3 s |
| Total prompt tokens | 3,934 | 4,706 |
| Total completion tokens | 1,553 | 1,900 |
| Estimated cost | $0.0015 | $0.0308 |
| Cost ratio | 1Γ | ~20Γ |
| pytest on produced code | 3/5 pass | 5/6 pass |
Headline finding. Real coordination defects surface naturally β the Reviewer agent reads but does not execute, missing test failures the Tester introduced. gpt-4o-mini's
test_error_handlingfails because an earliertime.sleep(2)refills the bucket; gpt-4o'stest_rate_limiter_thread_safetyreferencesthreading.Threadbut the test file imports onlypytestandtime. Both Reviewers issued APPROVE. The obvious next step: add a Runner role that executes the test suite and binds the failures back as a handle for the Reviewer to peek. Architecture composes.
Full write-up with per-stage ledger, cost breakdown, and reproducibility steps: benchmarks/duhwave-agile/RESULT.md.
Architecture
duhwave's five ADRs form a dependency DAG. ADR-028 (RLM substrate) is the foundation; everything else builds upward.
ββββββββββββββββββββββββββββββββββββββββ
β ADR-032 β topology β
β swarm.toml Β· .duhwave bundles Β· β
β duh wave CLI control plane β
βββββββββββββββ¬βββββββββββββββββββββββββ
β declares
βββββββββββββββΌβββββββββββββββββββββββββ
β ADR-031 β coordinator + ingress β
β Role / tool filtering Β· 5 listeners Β·β
β SubscriptionMatcher Β· TriggerLog β
ββββββββ¬βββββββββββββββββββββββββ¬βββββββ
β spawns β persists
ββββββββββββΌββββββββββ ββββββββββββΌβββββββββββ
β ADR-029 β Spawn β β ADR-030 β Task β
β cross-agent β β state machine β
β handle exposure βββΌβββΆβ 3 execution surfaces β
β + bind-back β β JSONL persistence β
βββββββββββ¬βββββββββββ ββββββββββββ¬βββββββββββ
β β
β both stand on β
βΌ βΌ
ββββββββββββββββββββββββββββββββββββββββ
β ADR-028 β RLM context engine β
β one Python REPL subprocess per β
β session Β· handles Β· Peek / Search β
β / Slice Β· cycle-detected recursion β
ββββββββββββββββββββββββββββββββββββββββ
See also
- Cookbook β build your own swarm on D.U.H. Β· the canonical walkthrough
- All ADRs on GitHub Β· ADR-028 through ADR-032 cover duhwave
- examples/duhwave/ Β· 10 runnable demos
- benchmarks/duhwave-agile/RESULT.md Β· real-OpenAI agile-team benchmark
- Zhang, Kraska, Khattab β Recursive Language Models (arXiv 2512.24601)
- Yang, Zou, Pan et al. β Recursive Multi-Agent Systems (arXiv 2604.25917)
- github.com/nikhilvallishayee/duh Β· D.U.H. repo