The Two Meanings of “Harness”

The word “harness” has two distinct meanings in AI. As Tejas Kumar explains, conflating them causes real confusion [^talk]. This repo has an eval runner (test suite for models) alongside an agent harness (tools + environment).

Eval Runner (Older Meaning)

Originating in ML research around 2021 with EleutherAI’s LM Evaluation Harness ¹, an eval runner measures model quality against known answers. (The EleutherAI project calls itself a “harness,” which is the source of the terminology collision this chapter addresses.)

dataset → model → scorer → pass/fail → summary

It is a test suite for models. You feed it a fixed set of inputs with expected outputs, run one or more models, and get scores. No tools, no loops, no guardrails.

Agent Harness (Newer Meaning)

Emerged in agentic engineering around 2026 ² ³. An agent harness enables a model to act in the real world — not just answer one prompt, but do actual work in a loop.

task → [tools + context + guardrails + loop + verify] → result

It has tools, state, guardrails, and verification. The model iterates until it finishes the task or a guardrail fires.

Side-by-Side Comparison

	Eval runner	Agent harness
Origin	ML research, 2021	Agentic engineering, 2026
Example	EleutherAI’s LM Evaluation Harness	Claude Agent SDK, this repo
Purpose	Measure model quality against known answers	Enable a model to act in the real world
Input	Fixed dataset	Open-ended task
Output	Scores and pass/fail	Answer + tool call log
Loop	One call per test case	Iterates until done or guardrail fires
Tools	None	Yes — the whole point
Guardrails	Not needed	Essential
State	Stateless	Conversation history across turns

Both are valuable. The eval runner tells you how good a model is. The agent harness tells you how well a model can do work.

lm-evaluation-harness — EleutherAI, 2021 ↩
“My AI Adoption Journey” — Mitchell Hashimoto, February 2026 ↩
“Harness engineering: leveraging Codex in an agent-first world” — OpenAI, February 2026 ↩

Keyboard shortcuts

AI Harness Engineering

The Two Meanings of “Harness”

Eval Runner (Older Meaning)

Agent Harness (Newer Meaning)

Side-by-Side Comparison