The Two Meanings of “Harness”
The word “harness” has two distinct meanings in AI. As Tejas Kumar explains, conflating them causes real confusion [^talk]. This repo has an eval runner (test suite for models) alongside an agent harness (tools + environment).
Eval Runner (Older Meaning)
Originating in ML research around 2021 with EleutherAI’s LM Evaluation Harness 1, an eval runner measures model quality against known answers. (The EleutherAI project calls itself a “harness,” which is the source of the terminology collision this chapter addresses.)
dataset → model → scorer → pass/fail → summary
It is a test suite for models. You feed it a fixed set of inputs with expected outputs, run one or more models, and get scores. No tools, no loops, no guardrails.
Agent Harness (Newer Meaning)
Emerged in agentic engineering around 2026 2 3. An agent harness enables a model to act in the real world — not just answer one prompt, but do actual work in a loop.
task → [tools + context + guardrails + loop + verify] → result
It has tools, state, guardrails, and verification. The model iterates until it finishes the task or a guardrail fires.
Side-by-Side Comparison
| Eval runner | Agent harness | |
|---|---|---|
| Origin | ML research, 2021 | Agentic engineering, 2026 |
| Example | EleutherAI’s LM Evaluation Harness | Claude Agent SDK, this repo |
| Purpose | Measure model quality against known answers | Enable a model to act in the real world |
| Input | Fixed dataset | Open-ended task |
| Output | Scores and pass/fail | Answer + tool call log |
| Loop | One call per test case | Iterates until done or guardrail fires |
| Tools | None | Yes — the whole point |
| Guardrails | Not needed | Essential |
| State | Stateless | Conversation history across turns |
Both are valuable. The eval runner tells you how good a model is. The agent harness tells you how well a model can do work.
-
lm-evaluation-harness — EleutherAI, 2021 ↩
-
“My AI Adoption Journey” — Mitchell Hashimoto, February 2026 ↩
-
“Harness engineering: leveraging Codex in an agent-first world” — OpenAI, February 2026 ↩