The Agent Harness: Giving Models an Environment
The agent harness enables a model to act in the real world. It provides tools, manages state, enforces limits, and verifies outcomes. This is the newer meaning of “harness” in AI — the focus of Tejas Kumar’s talk 1.
task → [tools + context + guardrails + loop + verify] → result
The full implementation lives at the project root. All code excerpts in this chapter are from the Rust implementation in this repo.
The harness is orchestrated by src/harness.rs, which ties together the
browser, tools, guardrails, login handler, and agent loop into a single
retry-capable pipeline.
Tool Registry
src/tools.rs defines the tools available to the model. Each tool has:
- A definition (name, description, parameter schema in OpenAI format)
- An execute function (the actual implementation)
#![allow(unused)]
fn main() {
pub struct Tool {
pub definition: ToolDefinition,
pub execute: ToolExecute,
}
}
Tools in this repo:
browser_navigate(url)— go to a URLbrowser_url()— get current URL (detect redirects to login pages)browser_get_text()— get visible page textbrowser_fill(selector, value)— fill an input fieldbrowser_click(selector)— click an element, wait for navigationbrowser_get_stories()— structured list of HN stories with IDs and voted statusbrowser_has_class(selector, className)— check CSS class (verify upvote state)
The critical architectural decision: tools are created with a
create_tools(session, upvote_state) function that binds them to a specific
environment session and shared state. Tools don’t manage the browser. They
don’t know about the browser lifecycle. The harness injects the session and
upvote tracking state into the tools at construction time.
Upvote Detection
The browser_click tool in src/tools.rs also detects HN upvote clicks.
After each click, it parses the selector for up_<STORYID> patterns and
checks whether the current URL is on news.ycombinator.com/news. If both
conditions match, it records the story ID into the shared
Arc<Mutex<Option<UpvotedStory>>> state, which the guardrails read to
stop the loop on success.
Model Client
src/model.rs provides a configurable ModelClient that talks to
Ollama’s OpenAI-compatible endpoint:
#![allow(unused)]
fn main() {
pub struct ModelClient { /* ... */ }
impl ModelClient {
pub fn new() -> Self { /* defaults to http://localhost:11434 */ }
pub fn with_seed(mut self, seed: u64) -> Self;
pub fn with_temperature(mut self, temperature: f32) -> Self;
pub fn with_max_tokens(mut self, max_tokens: u32) -> Self;
}
}
Swap models by changing one string in src/harness.rs. No API keys needed —
Ollama runs entirely locally. Override the endpoint with the OLLAMA_URL
environment variable.
Context Management
src/context.rs builds the initial message array for a new task:
#![allow(unused)]
fn main() {
pub fn create_context(task: &str) -> Vec<Message> {
vec![
Message { role: "system".into(), content: Some(SYSTEM_PROMPT.into()), .. },
Message { role: "user".into(), content: Some(task.into()), .. },
]
}
}
The loop appends tool call results and model responses to this array. In a more
sophisticated harness, context management would compact or trim old messages to
prevent context rot (note the MAX_CONTEXT_MESSAGES constant in
src/agent_loop.rs).
Guardrails
src/guardrails.rs provides composable guardrail functions that run before
every loop iteration:
max_iterations(limit): stop if the agent exceeds N loop iterationsmax_messages(limit): stop if the conversation exceeds M messagesstop_after_upvote(state): stop once the shared upvote state is setcombine_guardrails(vec): run multiple guardrails, first Stop winsdefault_guardrails(state): returns all three combined with sensible defaults
Guardrails are GuardrailFn closures over Arc<dyn Fn(&GuardrailInput) -> GuardrailResult>.
They catch structural failures — runaway agents, infinite loops, and detect
successful upvotes via shared state:
Agent Loop
src/agent_loop.rs is the orchestration engine:
loop:
1. trim context (if over MAX_CONTEXT_MESSAGES)
2. check guardrails → if Stop, return immediately
3. call model with current messages + tools
4. if model says "stop": return answer
5. if model calls tools:
for each tool call:
execute the tool
capture ToolEvent in trace
append result to messages
6. run login handler (if page is /login or /vote, auto-fill credentials
and inject a "harness_auto_login" event + user message)
7. log iteration to trace with all tool events
8. loop back to step 1
The loop tracks a full Vec<LoopIteration> trace, where each iteration
contains Vec<ToolEvent> with the tool name, arguments, and result. This
trace is used by harness.rs for verification and structured output.
The loop sends messages in native OpenAI format directly to Ollama’s
/v1/chat/completions endpoint. No message conversion layer needed.
Login Handler
src/login_handler.rs provides create_login_handler(session), which returns
a closure that runs after every batch of tool calls. It checks the current URL:
- If the URL contains
/loginor/vote(HN redirects unauthenticated upvote attempts to the login page), it auto-fills the credentials and submits the form viainput[name='acct'],input[name='pw'], andinput[type='submit'].
When triggered, it returns a ToolEvent { tool: "harness_auto_login", ... }
that the loop injects into the trace and appends a user message telling the
model it’s now authenticated and should navigate back to HN.
The Harness Lifecycle
The harness (src/harness.rs) owns the full lifecycle with retry logic.
This is the architectural decision that makes it a real harness rather than
just a loop with tools:
run_harness()
├── create shared UpvotedStory state ← tools write, guardrails read
├── create default_guardrails(state) ← includes stop_after_upvote
│
├── attempt 1..MAX_ATTEMPTS:
│ ├── BrowserSession::open() ← harness opens the environment
│ ├── create_tools(session, state) ← tools bound to session + state
│ ├── create_login_handler(session) ← auto-login on redirect
│ ├── create_context(TASK) ← fresh context for this task
│ ├── run_loop(guardrails, login) ← loop runs inside the environment
│ ├── verify_successful_upvote(result) ← check trace for up_ click
│ ├── if verified: return success
│ └── [Browser closed on Drop] ← always, via RAII
│
└── return result with verification
The harness opens the browser, creates tools bound to that browser page and
shared state, creates the login handler and guardrails, runs the loop with
retry logic, verifies the outcome via trace inspection, and cleans up via
Rust’s RAII drop semantics when BrowserSession is dropped.
Verification
verify_successful_upvote inspects the trace for an upvote click, then uses
the live browser session to check the page DOM. HN removes the upvote arrow
element entirely after a successful vote, so an element-not-found error (or the
presence of the nosee class) confirms the vote was registered by HN’s servers.
The harness retries up to 3 times if verification fails, reusing the same
shared upvote state across attempts.
Browser Environment
src/browser.rs provides the BrowserSession struct — a thin wrapper
around headless_chrome:
#![allow(unused)]
fn main() {
impl BrowserSession {
pub fn open() -> Result<Self>;
pub fn navigate(&self, url: &str) -> Result<String>;
pub fn get_url(&self) -> Result<String>;
pub fn get_text(&self) -> Result<String>;
pub fn fill(&self, selector: &str, value: &str) -> Result<String>;
pub fn click(&self, selector: &str) -> Result<String>;
pub fn get_stories(&self) -> Result<String>; // HN-specific
pub fn has_class(&self, selector: &str, class_name: &str) -> Result<String>;
}
}
Each harness run gets one isolated browser page via Chrome’s DevTools Protocol. When the run ends — whether it succeeded, failed, or threw — the browser closes via Rust’s RAII drop semantics.
-
“Harnesses in AI: A Deep Dive” — Tejas Kumar, AI Engineer World’s Fair, May 2026 ↩