TokenChef (Part 3): wtcraft — A Lightweight, Git-Native Scaffolding for Bounded Multi-Agent Coding
👨🍳 Series: TokenChef (Git-Native Multi-Agent Coding)
- Part 1: Vibe Coding with Git Worktrees: A Playbook Most Devs Are Missing
- Part 2: Chief Token Orchestrator: Manage Claude, Codex, and Gemini as a Structured Software Team
- 👉 Part 3: wtcraft: A Lightweight, Git-Native Scaffolding for Bounded Multi-Agent Coding (Current)
Contents
- Introduction
- What is Harness Engineering?
- The Competitive Landscape
- The Contract: .worktree-task.md
- Deterministic (D) vs. Agentic (A) Tagging
- wtcraft: The Local Scaffolding CLI
- Getting Started in 10 Seconds
Introduction
In Part 1: Chief Token Orchestrator, we explored the strategic shift from naive, parallel agents to a structured, layered agent team designed to protect your Token, Context, and Review budgets.
But design philosophies are useless without a mechanism to enforce them.
If you tell a coding agent to “fix this issue,” but don’t give it a strict sandbox and a verifiable boundary, it will wander off, touch files it shouldn’t, write unnecessary code, and blow through your API quota.
To prevent this, we need Harness Engineering. This article explores the tactical tools, the competitive task landscape, and how the lightweight, git-native CLI wtcraft implements bounded contracts on your local machine.
What is Harness Engineering?
In modern software engineering, a model (like Claude, Codex, or Gemini) is just the engine. The harness is the vehicle.
Martin Fowler defined Harness Engineering as the infrastructure, state management, error recovery, and boundary enforcement that wraps an AI model. An autonomous agent needs rails to keep it from derailing.
Major tech companies have already proven this concept in production:
- Stripe’s Minions Project: Stripe runs autonomous “minions” to write, test, and submit pull requests across hundreds of millions of lines of payments code. They handle 1,300+ AI PRs per week by enforcing strict, one-shot task boundaries.
- OpenAI’s Codex Engineering: OpenAI’s team successfully integrated parallel Codex executors by wrapping them in rigid sandbox environments, ensuring that agent activity was monitored and strictly verification-gated.
wtcraft is a lightweight, local-first implementation of these enterprise principles—built for solo developers who want production-grade discipline without heavy platform overhead or expensive cloud costs.
The Competitive Landscape
How do current platforms and agent formats guide execution? Let’s map the landscape.
1. Agent Platforms and Orchestrators
Platforms dictate where agents execute and how tasks are displayed.
| Tool | What it does | Strength | Gap | How wtcraft fits |
|---|---|---|---|---|
| Codex App | OpenAI’s desktop command center for managing parallel agents | Outstanding task UI and cross-stream visibility | Cloud-centric; struggles with local device state or private toolchains | Adds file-boundary contracts and budget gates for local execution |
| Codex Cloud | Sandboxed cloud containers provisioned by OpenAI | Zero-setup workspace isolation | No access to local databases, simulators, or private credentials | Wraps local tasks in a durable contract that bridges cloud and local builds |
| Claude Code Worktrees | Git-native worktree isolation (--worktree) | Native Git speed; great repo exploration | Token budget, handoff formats, and file scope are left to the user | Supplies the missing contract layer: Scope and Off-limits blocks |
| workmux | Tmux layout wrapper for parallel git worktrees | Low-friction terminal workspace management | Manages terminal layout, not task execution or file boundaries | .worktree-task.md adds intent contracts inside workmux sessions |
2. Task Formats and Skill Specs
Task formats dictate how an agent is instructed.
| Tool | Format | What it defines | Gap vs .worktree-task.md |
|---|---|---|---|
| Devin AI | SKILL.md | Recurring workflow patterns for an agent | Describes a pattern, not a task execution unit. No Scope or Verification lifecycle. |
| Sweep AI | Custom YAML | Prompts and triggers for Sweep PR generation | Focused on automated PR behavior, not individual bounded worktree tasks. |
| GitHub Copilot Workspace | Ephemeral Plan | Interactive UI-scoped step list | Session-only. Not readable by other agents when the session ends; no persistent Off-limits rules. |
| SWE-agent | Global YAML | Configures model tools and maximum step counts | Configures the agent itself, not individual, disjoint code tasks. |
| Claude SubAgents | .claude/agents/*.md | Reusable sub-agent tool configurations | Defs for sub-agents, not individual, verifiable branch contracts. |
If we evaluate these formats against the six properties of a perfect task contract, we find a stark gap:
| Tool | Repo-native | Scope | Off-limits | Verification | Status lifecycle | Worktree-native |
|---|---|---|---|---|---|---|
| Devin SKILL.md | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Sweep Custom YAML | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Copilot Workspace | ✗ | partial | ✗ | ✗ | ✗ | ✗ |
| SWE-agent config | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Claude SubAgents | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |
.worktree-task.md | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
Every existing format either defines how an agent behaves globally or what to build in a free-text prompt. None defines exactly which files an agent is allowed to touch, which files are strictly off-limits, and what deterministic tests prove it stayed inside the lines.
That is the gap wtcraft closes.
The Contract: .worktree-task.md
In a wtcraft-enabled repository, every active branch worktree contains a task contract: .worktree-task.md at its root.
Here is what a real contract looks like:
---
branch: feat/billing-sync
agent: codex
status: ready
created: 2026-05-26
priority: high
base: main
---
## Scope
- src/billing/sync.py
- src/billing/sync_test.py
## Steps
- [ ] [D] Read files in Scope and Context in full.
- [ ] [A] Implement only the BLE offline sync client.
- [ ] [D] Run verification checks and capture outcomes.
## Off-limits
- src/db/schema.py
- package.json
- CLAUDE.md
- AGENTS.md
- GEMINI.md
## Context
Follow the existing sync pattern defined in `src/billing/legacy_sync.py`.
## Verification
- [ ] python -m unittest src/billing/sync_test.py
- [ ] flake8 src/billing/sync.py
This file is the single source of truth. The Orchestrator scaffolds it, the Planner (Claude Opus) designs it, the Executor (Codex) reads it, the Verifier (Claude) checks it, and the Finisher (Gemini) verifies and cleans it.
Deterministic [D] vs. Agentic [A] Tagging
Notice the tags in the checklist:
[D](Deterministic): Indicates actions grounded in absolute git or shell reality. Reading files, executing compiler checks, running linter rules, and verifying exit codes.[A](Agentic): Indicates actions requiring semantic reasoning. Generating implementations, interpreting test errors, making coding choices.
This separation is vital. Deterministic steps are the ground truth.
If a [D] verification step fails (e.g. flake8 returns exit code 1), the Executor agent is not allowed to ignore it or mark the task complete. The harness (wtcraft) detects the exit code, halts the execution loop, and demands a deterministic fix or a re-plan. This completely eliminates LLM “vibe coding” and keeps code quality mathematically sound.
wtcraft: The Local Scaffolding CLI
wtcraft is a lightweight, non-invasive CLI that implements this contract layer. It doesn’t replace your editors, agent apps, or custom configurations. It sits one layer below, acting as the guardrails.
Core CLI Commands
wtcraft init: Scaffolds the harness directories and templates. If you want integration with Claude, Codex, or Gemini CLI, opt-in with:
This appends a tiny, non-invasive routing block towtcraft init --patch-agent-filesCLAUDE.md,AGENTS.md, andGEMINI.md. Your custom project instructions remain completely untouched.wtcraft new <branch-name>: Automatically provisions a new git worktree sandbox, checks out a clean branch, and scaffolds a custom.worktree-task.mdcontract.wtcraft status: Scans your worktree directories and displays a clean console matrix of all active tasks, assigned agents, priorities, and lifecycles.wtcraft check <task>: Automatically audits modified files against the contract’sScopeandOff-limitslists. If an agent modified a file listed inOff-limits,wtcraft checkfails, alerting you before any code is committed.wtcraft verify <task>: Executes the verification test commands defined in the contract and reports a clean, structured pass/fail telemetry matrix.
Getting Started in 10 Seconds
You can install wtcraft locally using Homebrew, npm, or pipx:
# Node / npm
npm install -g wtcraft
# Python / pipx (recommended)
pipx install wtcraft
# Homebrew (macOS)
brew tap zywkloo/wtcraft https://github.com/zywkloo/wtcraft && brew install wtcraft
Once installed, simply run wtcraft init in your repository root, and start orchestrating your own layered agent team with strict, verifiable boundaries.
Harness engineering is how solo developers go from wrangling loose models to managing a high-performance, cost-effective digital software team.