Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Why Harness Engineering Matters

The Origin of the Term

In February 2026, Mitchell Hashimoto — co-founder of HashiCorp, creator of Terraform — published a blog post that gave the practice a name 1:

Whenever an agent makes a mistake, you engineer the environment so it won’t make that mistake again.

Days later, OpenAI used the same phrase describing how they built an internal beta product 2: roughly one million lines of code, written entirely by agents, shipped in five months, with no manually written source code. Their key insight:

When something failed, the fix was almost never “try harder.” Human engineers always stepped in and asked: what capability is missing, and how do we make it both legible and enforceable for the agent?

The Shift in Engineering Work

Harness engineering changes the engineer’s job, as described by both Hashimoto and OpenAI 1 2:

BeforeAfter
Write codeDesign environments
Debug implementationSpecify intent
Fix bugs directlyBuild feedback loops
Optimize logicStructure constraints

The engineer stops writing application code and starts designing the scaffolding that makes agents productive.

Three Core Components

Drawing from Thoughtworks and OpenAI, three core components of a harness are 2:

  1. Context engineering — deciding what information to include or exclude at each model call: isolation (keep subtasks separate), reduction (drop stale data to avoid context rot), retrieval (inject fresh docs or search results at the right time).

  2. Architectural constraints — enforced not just by the model, but by deterministic linters, structural tests, and guardrails the model cannot bypass.

  3. Verification and feedback loops — the harness checks outputs, runs eval steps, and if something is wrong, surfaces it so the agent or the engineer can fix it.

Why It Matters Now

As Tejas Kumar notes in his talk 3:

“The name of the game with harness is reliability. It’s making sure that the agents we build do what they do, period. Irrespective of the black box model.”

Users pay $20/month for Claude Pro. The model is a black box. Anthropic could serve Sonnet instead of Opus without notice. Too many variables escape control.

Harnesses solve that. They anchor an agent in a stable environment. The goal is reliability regardless of the rented model.

The company that builds the best harness will win, not the company with the most advanced model.


  1. “My AI Adoption Journey” — Mitchell Hashimoto, February 2026 ↩2

  2. “Harness engineering: leveraging Codex in an agent-first world” — OpenAI, February 2026 ↩2 ↩3

  3. “Harnesses in AI: A Deep Dive” — Tejas Kumar, AI Engineer World’s Fair, May 2026