👨‍🍳 TokenChef (Part 03)

May 29, 2026

wtcraft — A Lightweight, Git-Native Governance Core for Claude Code, Codex, and Gemini

👨‍🍳 Series: TokenChef (Git-Native Multi-Agent Coding)

Part 2: Chief Token Orchestrator — Manage Claude, Codex, and Gemini as a Structured Software Team
👉 Part 3: wtcraft — A Lightweight, Git-Native Governance Core for Claude Code, Codex, and Gemini (Current)
Part 4: Stop Reinventing the Agent — wtcraft Boundaries, Ledgers & Sign-off First / 别卷 Agent 了：wtcraft 要先把边界、账本和验收做好

Introduction
When wtcraft earns its keep (and when to skip it)
What is Harness Engineering?
The Competitive Landscape
The Contract: .worktree-task.md
Deterministic (D) vs. Agentic (A) Tagging
wtcraft: The Local Scaffolding CLI
Getting Started in 10 Seconds

Introduction

In Part 1: Chief Token Orchestrator, we explored the strategic shift from naive, parallel agents to a structured, layered agent team designed to protect your Token, Context, and Review budgets.

But design philosophies are useless without a mechanism to enforce them.

If you tell a coding agent to “fix this issue,” but don’t give it a strict sandbox and a verifiable boundary, it will wander off, touch files it shouldn’t, write unnecessary code, and blow through your API quota.

To prevent this, we need Harness Engineering. This article explores the tactical tools, the competitive task landscape, and how the lightweight, git-native CLI wtcraft implements bounded contracts on your local machine.

When wtcraft earns its keep (and when to skip it)

Before the mechanics, the honest scoping — because a tool that can’t tell you when not to use it isn’t worth trusting.

wtcraft is built for one specific situation: you run more than one agent vendor — say Claude Code, Codex, and the Gemini CLI — on a limited budget, and you want to spread work across them (to stretch free tiers and cheaper quota) without them tripping over each other. That’s the seam where coordination actually hurts: two vendors editing the same files, blowing past scope, or dropping the task hand-off between them. wtcraft’s whole job is to keep each one in its lane — a per-task contract, isolated worktrees, and a verifiable sign-off.

If that’s not you, skip it. If you run a single vendor on an unlimited plan, you don’t need an external governance layer — that product’s own sub-agent orchestration already handles the coordination inside its own walls. wtcraft earns its keep precisely at the multi-vendor + budget seam, not inside one vendor’s garden. And because it’s git-native — no daemon, no database, state lives in your repo — adopting it costs you nothing to try and nothing to walk away from.

What is Harness Engineering?

In modern software engineering, a model (like Claude, Codex, or Gemini) is just the engine. The harness is the vehicle.

Martin Fowler defined Harness Engineering as the infrastructure, state management, error recovery, and boundary enforcement that wraps an AI model. An autonomous agent needs rails to keep it from derailing.

Major tech companies have already proven this concept in production:

Stripe’s Minions Project: Stripe runs autonomous “minions” to write, test, and submit pull requests across hundreds of millions of lines of payments code. They handle 1,300+ AI PRs per week by enforcing strict, one-shot task boundaries.
OpenAI’s Codex Engineering: OpenAI’s team successfully integrated parallel Codex executors by wrapping them in rigid sandbox environments, ensuring that agent activity was monitored and strictly verification-gated.

wtcraft is a lightweight, local-first implementation of these enterprise principles—built for solo developers who want production-grade discipline without heavy platform overhead or expensive cloud costs.

The Competitive Landscape

How do current platforms and agent formats guide execution? Let’s map the landscape.

1. Agent Platforms and Orchestrators

Platforms dictate where agents execute and how tasks are displayed.

Tool	What it does	Strength	Gap	How wtcraft fits
Codex App	OpenAI’s desktop command center for managing parallel agents	Outstanding task UI and cross-stream visibility	Cloud-centric; struggles with local device state or private toolchains	Adds file-boundary contracts and budget gates for local execution
Codex Cloud	Sandboxed cloud containers provisioned by OpenAI	Zero-setup workspace isolation	No access to local databases, simulators, or private credentials	Wraps local tasks in a durable contract that bridges cloud and local builds
Claude Code Worktrees	Git-native worktree isolation (`--worktree`)	Native Git speed; great repo exploration	Token budget, handoff formats, and file scope are left to the user	Supplies the missing contract layer: `Scope` and `Off-limits` blocks
workmux	Tmux layout wrapper for parallel git worktrees	Low-friction terminal workspace management	Manages terminal layout, not task execution or file boundaries	`.worktree-task.md` adds intent contracts inside workmux sessions

2. Task Formats and Skill Specs

Task formats dictate how an agent is instructed.

Tool	Format	What it defines	Gap vs `.worktree-task.md`
Devin AI	`SKILL.md`	Recurring workflow patterns for an agent	Describes a pattern, not a task execution unit. No Scope or Verification lifecycle.
Sweep AI	Custom YAML	Prompts and triggers for Sweep PR generation	Focused on automated PR behavior, not individual bounded worktree tasks.
GitHub Copilot Workspace	Ephemeral Plan	Interactive UI-scoped step list	Session-only. Not readable by other agents when the session ends; no persistent `Off-limits` rules.
SWE-agent	Global YAML	Configures model tools and maximum step counts	Configures the agent itself, not individual, disjoint code tasks.
Claude SubAgents	`.claude/agents/*.md`	Reusable sub-agent tool configurations	Defs for sub-agents, not individual, verifiable branch contracts.

If we evaluate these formats against the six properties of a perfect task contract, we find a stark gap:

Tool	Repo-native	Scope	Off-limits	Verification	Status lifecycle	Worktree-native
Devin SKILL.md	✓	✗	✗	✗	✗	✗
Sweep Custom YAML	✓	✗	✗	✗	✗	✗
Copilot Workspace	✗	partial	✗	✗	✗	✗
SWE-agent config	✓	✗	✗	✗	✗	✗
Claude SubAgents	✓	✗	✗	✗	✗	✗
`.worktree-task.md`	✓	✓	✓	✓	✓	✓

Every existing format either defines how an agent behaves globally or what to build in a free-text prompt. None defines exactly which files an agent is allowed to touch, which files are strictly off-limits, and what deterministic tests prove it stayed inside the lines.

That is the gap wtcraft closes.

The Contract: `.worktree-task.md`

In a wtcraft-enabled repository, every active branch worktree contains a task contract: .worktree-task.md at its root.

Here is what a real contract looks like:

---
branch: feat/billing-sync
agent: codex
status: ready
created: 2026-05-26
priority: high
base: main
---

## Scope

- src/billing/sync.py
- src/billing/sync_test.py

## Steps

- [ ] [D] Read files in Scope and Context in full.
- [ ] [A] Implement only the BLE offline sync client.
- [ ] [D] Run verification checks and capture outcomes.

## Off-limits

- src/db/schema.py
- package.json
- CLAUDE.md
- AGENTS.md
- GEMINI.md

## Context

Follow the existing sync pattern defined in `src/billing/legacy_sync.py`.

## Verification

- [ ] python -m unittest src/billing/sync_test.py
- [ ] flake8 src/billing/sync.py

This file is the single source of truth. The Orchestrator scaffolds it, the Planner (Claude Opus) designs it, the Executor (Codex) reads it, the Verifier (Claude) checks it, and the Finisher (Gemini) verifies and cleans it.

Deterministic `[D]` vs. Agentic `[A]` Tagging

Notice the tags in the checklist:

[D] (Deterministic): Indicates actions grounded in absolute git or shell reality. Reading files, executing compiler checks, running linter rules, and verifying exit codes.
[A] (Agentic): Indicates actions requiring semantic reasoning. Generating implementations, interpreting test errors, making coding choices.

This separation is vital. Deterministic steps are the ground truth.

If a [D] verification step fails (e.g. flake8 returns exit code 1), the Executor agent is not allowed to ignore it or mark the task complete. The harness (wtcraft) detects the exit code, halts the execution loop, and demands a deterministic fix or a re-plan. This completely eliminates LLM “vibe coding” and keeps code quality mathematically sound.

wtcraft: The Local Scaffolding CLI

wtcraft is a lightweight, non-invasive CLI that implements this contract layer. It doesn’t replace your editors, agent apps, or custom configurations. It sits one layer below, acting as the guardrails.

Core CLI Commands

wtcraft init: Scaffolds the harness directories and templates. If you want integration with Claude, Codex, or Gemini CLI, opt-in with:
```
wtcraft init --patch-agent-files
```
This appends a tiny, non-invasive routing block to CLAUDE.md, AGENTS.md, and GEMINI.md. Your custom project instructions remain completely untouched.
wtcraft new <branch-name>: Automatically provisions a new git worktree sandbox, checks out a clean branch, and scaffolds a custom .worktree-task.md contract.
wtcraft status: Scans your worktree directories and displays a clean console matrix of all active tasks, assigned agents, priorities, and lifecycles.
wtcraft check <task>: Automatically audits modified files against the contract’s Scope and Off-limits lists. If an agent modified a file listed in Off-limits, wtcraft check fails, alerting you before any code is committed.
wtcraft verify <task>: Executes the verification test commands defined in the contract and reports a clean, structured pass/fail telemetry matrix.

Getting Started in 10 Seconds

You can install wtcraft locally using Homebrew, npm, or pipx:

# Node / npm
npm install -g wtcraft

# Python / pipx (recommended)
pipx install wtcraft

# Homebrew (macOS)
brew tap zywkloo/wtcraft https://github.com/zywkloo/wtcraft && brew install wtcraft

Once installed, simply run wtcraft init in your repository root, and start orchestrating your own layered agent team with strict, verifiable boundaries.

Harness engineering is how solo developers go from wrangling loose models to managing a high-performance, cost-effective digital software team.

Repository: github.com/zywkloo/wtcraft
Part 1: Chief Token Orchestrator: Manage Claude, Codex, and Gemini as a Structured Software Team

🔗Citation (BibTeX)

@misc{zhang2026,
  title = {wtcraft — A Lightweight, Git-Native Governance Core for Claude Code, Codex, and Gemini},
  author = {Victor Zhang},
  year = {2026},
  howpublished = {\url{https://zywkloo.github.io/blog/wtcraft-lightweight-git-native-multi-agent-scaffolding/}}
}