Skip to main content
duhwave Β· persistent agentic-swarm extension

duhwave

A layered runtime on top of D.U.H. that turns single-shot agent invocations into a persistent, event-driven swarm. Five ADRs, ten runnable demos, one TOML file per topology.

What is duhwave

A single-agent harness is one loop: one prompt, one model, one tool registry. It exits when the work is done. duhwave keeps the loop alive past any single CLI invocation, accepts work from external triggers (webhooks, file watches, cron, MCP push), and lets a coordinator delegate to specialised workers via a shared substrate of variable handles instead of prose summaries.

The architecture composes from six primitives: an RLM substrate (one Python REPL subprocess per session, large inputs bound as named variables), a persistent Task lifecycle (forward-only state machine, three execution surfaces, JSONL on disk), a coordinator-as-prompt-role (one engine, no subclasses, tool filtering by allowlist), recursive cross-agent links (workers see scoped handles, results bind back as new handles), an event-ingress layer (five listeners, one immutable Trigger type, one append-only TriggerLog), and a topology DSL (one TOML file, packed and signed as a .duhwave bundle).

Five ADRs

duhwave's design lives in five Architecture Decision Records. Each ADR ships with implementation under duh/duhwave/, unit and integration tests under tests/, and at least one runnable demo under examples/duhwave/.

# Title Implementation Tests Demo
028 RLM context engine duh/duhwave/rlm/ tests/duhwave/rlm/ examples/duhwave/01_rlm_demo.py
029 Recursive cross-agent links duh/duhwave/coordinator/spawn.py tests/duhwave/coordinator/ examples/duhwave/02_swarm_demo.py
030 Persistent Task lifecycle duh/duhwave/task/ tests/duhwave/task/ examples/duhwave/repo_triage/main.py
031 Coordinator role + event ingress duh/duhwave/coordinator/ Β· duh/duhwave/ingress/ tests/duhwave/ingress/ examples/duhwave/03_event_driven.py
032 Topology DSL + bundles + control plane duh/duhwave/spec/ Β· duh/duhwave/bundle/ Β· duh/duhwave/cli/ tests/duhwave/spec/ Β· tests/duhwave/bundle/ examples/duhwave/04_topology_bundle.py

Quick start

Install D.U.H., write a minimal swarm.toml, install the bundle, start the host daemon. The daemon stays alive until you stop it; triggers spawn Tasks; Tasks run agents; agents bind results back as REPL handles.

bash
# 1. Install D.U.H. (duhwave ships in the same package)
pip install duh-cli

# 2. Author a topology directory: foo.duhwave/{manifest.toml, swarm.toml, permissions.toml}
#    See examples/duhwave/repo_triage/ for the canonical layout.

# 3. Install the bundle into ~/.duh/waves/
duh wave install ./foo.duhwave

# 4. Start the host daemon
duh wave start

# 5. Inspect what is running
duh wave ls
duh wave inspect foo
duh wave logs --follow foo

# 6. Stop the daemon when done
duh wave stop

A minimal swarm.toml:

swarm.toml
[swarm]
name           = "hello-wave"
version        = "0.1.0"
format_version = 1

[[agents]]
id            = "coordinator"
role          = "coordinator"
model         = "anthropic/claude-haiku-4-5"
tools         = ["Spawn", "SendMessage", "Stop", "Peek", "Search", "Slice"]
system_prompt = "prompts/coordinator.md"

[[agents]]
id            = "worker"
role          = "worker"
model         = "anthropic/claude-haiku-4-5"
tools         = ["Read", "Grep", "Peek", "Search", "Slice"]
system_prompt = "prompts/worker.md"

[[triggers]]
kind            = "manual"
source          = "hello"
target_agent_id = "coordinator"

[[edges]]
from_agent_id = "coordinator"
to_agent_id   = "worker"
kind          = "spawn"

[budget]
max_concurrent_tasks = 3
max_usd_per_day      = 1.00

10 runnable demos

Every duhwave primitive is exercised by a runnable demo under examples/duhwave/. The parity_hermes/ set walks through five harness-parity demos; parity_claw/ walks through four control-plane demos. The headline showpieces β€” repo_triage/, agile_team/, real_e2e/, telegram_assistant/ β€” are full end-to-end swarms.

# Demo What it shows
1 01_rlm_demo.py RLM REPL substrate: bind a 235K-char repo, peek + search by handle, never inline the bytes
2 02_swarm_demo.py Coordinator + two workers via Spawn; result bind-back as a new handle
3 03_event_driven.py All five ingress listeners: webhook, filewatch, cron, MCP push, manual seam
4 04_topology_bundle.py Parse swarm.toml, pack a deterministic .duhwave ZIP, install with TOFU trust
5 repo_triage/main.py ~400 LOC end-to-end: bundle build β†’ install β†’ daemon β†’ trigger β†’ orchestrate β†’ inspect β†’ stop
6 agile_team/main.py 5-stage pipeline (PM β†’ Architect β†’ Engineer β†’ Tester β†’ Reviewer); stub or real OpenAI runner
7 real_e2e/main.py Real daemon subprocess + real HTTP webhook + real OpenAI completion + real outbox file
8 telegram_assistant/main.py Persistent assistant with three flows: inbound webhook, scheduled cron, on-demand manual
9 parity_hermes/run_all.py Five demos: multimode adapters, tool-arg repair, parallel dispatch, RLM-replaces-compaction, shared budget
10 parity_claw/run_all.py Four demos: four ingress channels, persistent state, concurrent ingress, per-channel isolation

Real-OpenAI benchmark

A single CLI invocation drives a 5-stage agile-team pipeline (PM β†’ Architect β†’ Engineer β†’ Tester β†’ Reviewer) against real OpenAI models. Each stage spawns a worker, reads exposed handles from the coordinator's RLM REPL, and binds its result back. The benchmark runs pytest against the produced implementation; the pass rate is the headline number.

Metric gpt-4o-mini gpt-4o
Stages completed5 / 55 / 5
Wall (single-threaded)35.5 s29.3 s
Total prompt tokens3,9344,706
Total completion tokens1,5531,900
Estimated cost$0.0015$0.0308
Cost ratio1Γ—~20Γ—
pytest on produced code3/5 pass5/6 pass

Headline finding. Real coordination defects surface naturally β€” the Reviewer agent reads but does not execute, missing test failures the Tester introduced. gpt-4o-mini's test_error_handling fails because an earlier time.sleep(2) refills the bucket; gpt-4o's test_rate_limiter_thread_safety references threading.Thread but the test file imports only pytest and time. Both Reviewers issued APPROVE. The obvious next step: add a Runner role that executes the test suite and binds the failures back as a handle for the Reviewer to peek. Architecture composes.

Full write-up with per-stage ledger, cost breakdown, and reproducibility steps: benchmarks/duhwave-agile/RESULT.md.

Architecture

duhwave's five ADRs form a dependency DAG. ADR-028 (RLM substrate) is the foundation; everything else builds upward.

                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                  β”‚          ADR-032 β€” topology          β”‚
                  β”‚   swarm.toml Β· .duhwave bundles Β·    β”‚
                  β”‚      duh wave CLI control plane       β”‚
                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚ declares
                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                  β”‚     ADR-031 β€” coordinator + ingress   β”‚
                  β”‚  Role / tool filtering Β· 5 listeners Β·β”‚
                  β”‚     SubscriptionMatcher Β· TriggerLog  β”‚
                  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
                         β”‚ spawns                 β”‚ persists
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚ ADR-029 β€” Spawn   β”‚   β”‚ ADR-030 β€” Task       β”‚
              β”‚ cross-agent       β”‚   β”‚ state machine       β”‚
              β”‚ handle exposure ──┼──▢│ 3 execution surfaces β”‚
              β”‚ + bind-back        β”‚   β”‚ JSONL persistence   β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β”‚                          β”‚
                        β”‚  both stand on           β”‚
                        β–Ό                          β–Ό
                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                  β”‚     ADR-028 β€” RLM context engine      β”‚
                  β”‚   one Python REPL subprocess per     β”‚
                  β”‚   session Β· handles Β· Peek / Search  β”‚
                  β”‚   / Slice Β· cycle-detected recursion  β”‚
                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

See also