The harness is the product.
Harness Engineering is a twelve-part illustrated essay series about building agents that actually work. It reads the primary literature closely — and tries to leave you with a picture, and something you can build.
The premise
Two teams deploy the same frontier model. One ships an agent that resolves most of its tasks; the other ships one that stalls, loops, and quietly fails. Nothing about the model varied — something else did almost all of the work. That something is the harness: the system wrapped around a model that assembles its context, routes its tools, verifies its outputs, remembers across steps, and decides when to stop.
The organizing claim of the series is that a deployed agent's capability factors into model quality and harness quality — and in 2026 the harness term has the steeper gradient. Spending a month improving the harness buys more deployed capability than waiting for the next model. Each essay follows one thread of that argument from its primary sources to a claim you can argue with, and ends in a deployable artifact.
How to read it
The twelve essays are organized into four parts and build on one another, but each stands on its own. Start at The Harness Is the Product for the argument in order, browse the contents, or jump to whatever pulls you. Every essay carries its anchor papers up top, key takeaways inline, full references at the end, and a shared notation defined once and reused throughout.
- Part I · The Problem — Name the discipline; expose the measurement crisis. (1, 2)
- Part II · The Engine — Build the agent's training loop — environments, learning, memory, tools. (3, 4, 5, 6)
- Part III · The Build — Navigate the hard problems — planning, coordination, the proving ground. (7, 8, 9)
- Part IV · The Deployment Frontier — Ship it — security, operations, and autonomous improvement. (10, 11, 12)
The research threads
Ten tags trace the threads that weave through the series:
- AG — Agents & Harness
- AR — Agentic RL
- CL — Continual Learning
- ME — Memory
- TO — Tools & Skills
- MA — Multi-Agent
- SW — Software Engineering
- EV — Evaluation
- SE — Security
- FN — Foundations
On the illustrations
The pictures are deliberately diagrammatic: hand-built inline SVG — architecture sketches, scaling curves, audit tables, threat loops — and pixel-art silhouettes that re-tint in dark mode. Every figure is grounded: each datapoint, curve inflection, and timeline node maps to a result in a cited paper. The aim is the same as the prose — a clear silhouette of an idea.
The sister series
Harness Engineering is the deployment-layer companion to Continual Intelligence. Where Continual Intelligence covers the mechanisms of learning — plasticity, the big-world hypothesis, reasoning, open-ended evolution — this series covers their deployment: the environments that supply experience, the memory that survives an episode, and the loop that improves itself. The two are meant to be read together; cross-references point across the boundary.
Who writes this
Written in June 2026 by Datt Goswami. If something here is wrong, unclear, or worth arguing about, reach out at dattgoswami@gmail.com, on Twitter / X, LinkedIn or dattgoswami.com.
Some of what this guide describes I also build: cl-agent, an open-source continual-learning substrate for coding agents, and Ferrum Cloud, its productized form — a capture → replay → distill → evaluate loop that plugs into the harnesses you already run (Codex, Aider, SWE-agent, OpenHands) and makes them improve over time. It is early and bootstrapped; the waitlist is open.
The full index
- The Harness Is the Product
- The Agent Evaluation Crisis
- Environments Are the Bottleneck
- Agents That Learn on the Job
- The Memory Stack
- Tools, Skills, and the Action Interface
- Planning and the Myopia Problem
- Multi-Agent Systems and Their Failure Modes
- Software Engineering Agents: The Proving Ground
- Securing the Agentic Perimeter
- Agent Ops: Running Agents in Production
- Self-Improving Agents