About the series · Harness Engineering

The premise

Two teams deploy the same frontier model. One ships an agent that resolves most of its tasks; the other ships one that stalls, loops, and quietly fails. Nothing about the model varied — something else did almost all of the work. That something is the harness: the system wrapped around a model that assembles its context, routes its tools, verifies its outputs, remembers across steps, and decides when to stop.

The organizing claim of the series is that a deployed agent's capability factors into model quality and harness quality — and in 2026 the harness term has the steeper gradient. Spending a month improving the harness buys more deployed capability than waiting for the next model. Each essay follows one thread of that argument from its primary sources to a claim you can argue with, and ends in a deployable artifact.

How to read it

The twelve essays are organized into four parts and build on one another, but each stands on its own. Start at The Harness Is the Product for the argument in order, browse the contents, or jump to whatever pulls you. Every essay carries its anchor papers up top, key takeaways inline, full references at the end, and a shared notation defined once and reused throughout.

Part I · The Problem — Name the discipline; expose the measurement crisis. (1, 2)
Part II · The Engine — Build the agent's training loop — environments, learning, memory, tools. (3, 4, 5, 6)
Part III · The Build — Navigate the hard problems — planning, coordination, the proving ground. (7, 8, 9)
Part IV · The Deployment Frontier — Ship it — security, operations, and autonomous improvement. (10, 11, 12)

The research threads

Ten tags trace the threads that weave through the series:

AG — Agents & Harness
AR — Agentic RL
CL — Continual Learning
ME — Memory
TO — Tools & Skills
MA — Multi-Agent
SW — Software Engineering
EV — Evaluation
SE — Security
FN — Foundations

On the illustrations

The pictures are deliberately diagrammatic: hand-built inline SVG — architecture sketches, scaling curves, audit tables, threat loops — and pixel-art silhouettes that re-tint in dark mode. Every figure is grounded: each datapoint, curve inflection, and timeline node maps to a result in a cited paper. The aim is the same as the prose — a clear silhouette of an idea.

The sister series

Harness Engineering is the deployment-layer companion to Continual Intelligence. Where Continual Intelligence covers the mechanisms of learning — plasticity, the big-world hypothesis, reasoning, open-ended evolution — this series covers their deployment: the environments that supply experience, the memory that survives an episode, and the loop that improves itself. The two are meant to be read together; cross-references point across the boundary.

Who writes this

Written in June 2026 by Datt Goswami. If something here is wrong, unclear, or worth arguing about, reach out at dattgoswami@gmail.com, on Twitter / X, LinkedIn or dattgoswami.com.

Some of what this guide describes I also build: cl-agent, an open-source continual-learning substrate for coding agents, and Ferrum Cloud, its productized form — a capture → replay → distill → evaluate loop that plugs into the harnesses you already run (Codex, Aider, SWE-agent, OpenHands) and makes them improve over time. It is early and bootstrapped; the waitlist is open.