Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

The Agent Harness: Giving Models an Environment

The agent harness enables a model to act in the real world. It provides tools, manages state, enforces limits, and verifies outcomes. This is the newer meaning of “harness” in AI — the focus of Tejas Kumar’s talk 1.

task → [tools + context + guardrails + loop + verify] → result

The full implementation lives at the project root. All code excerpts in this chapter are from the Rust implementation in this repo.

The harness is orchestrated by src/harness.rs, which ties together the browser, tools, guardrails, login handler, and agent loop into a single retry-capable pipeline.

Tool Registry

src/tools.rs defines the tools available to the model. Each tool has:

  • A definition (name, description, parameter schema in OpenAI format)
  • An execute function (the actual implementation)
#![allow(unused)]
fn main() {
pub struct Tool {
    pub definition: ToolDefinition,
    pub execute: ToolExecute,
}
}

Tools in this repo:

  • browser_navigate(url) — go to a URL
  • browser_url() — get current URL (detect redirects to login pages)
  • browser_get_text() — get visible page text
  • browser_fill(selector, value) — fill an input field
  • browser_click(selector) — click an element, wait for navigation
  • browser_get_stories() — structured list of HN stories with IDs and voted status
  • browser_has_class(selector, className) — check CSS class (verify upvote state)

The critical architectural decision: tools are created with a create_tools(session, upvote_state) function that binds them to a specific environment session and shared state. Tools don’t manage the browser. They don’t know about the browser lifecycle. The harness injects the session and upvote tracking state into the tools at construction time.

Upvote Detection

The browser_click tool in src/tools.rs also detects HN upvote clicks. After each click, it parses the selector for up_<STORYID> patterns and checks whether the current URL is on news.ycombinator.com/news. If both conditions match, it records the story ID into the shared Arc<Mutex<Option<UpvotedStory>>> state, which the guardrails read to stop the loop on success.

Model Client

src/model.rs provides a configurable ModelClient that talks to Ollama’s OpenAI-compatible endpoint:

#![allow(unused)]
fn main() {
pub struct ModelClient { /* ... */ }

impl ModelClient {
    pub fn new() -> Self { /* defaults to http://localhost:11434 */ }
    pub fn with_seed(mut self, seed: u64) -> Self;
    pub fn with_temperature(mut self, temperature: f32) -> Self;
    pub fn with_max_tokens(mut self, max_tokens: u32) -> Self;
}
}

Swap models by changing one string in src/harness.rs. No API keys needed — Ollama runs entirely locally. Override the endpoint with the OLLAMA_URL environment variable.

Context Management

src/context.rs builds the initial message array for a new task:

#![allow(unused)]
fn main() {
pub fn create_context(task: &str) -> Vec<Message> {
    vec![
        Message { role: "system".into(), content: Some(SYSTEM_PROMPT.into()), .. },
        Message { role: "user".into(), content: Some(task.into()), .. },
    ]
}
}

The loop appends tool call results and model responses to this array. In a more sophisticated harness, context management would compact or trim old messages to prevent context rot (note the MAX_CONTEXT_MESSAGES constant in src/agent_loop.rs).

Guardrails

src/guardrails.rs provides composable guardrail functions that run before every loop iteration:

  • max_iterations(limit): stop if the agent exceeds N loop iterations
  • max_messages(limit): stop if the conversation exceeds M messages
  • stop_after_upvote(state): stop once the shared upvote state is set
  • combine_guardrails(vec): run multiple guardrails, first Stop wins
  • default_guardrails(state): returns all three combined with sensible defaults

Guardrails are GuardrailFn closures over Arc<dyn Fn(&GuardrailInput) -> GuardrailResult>. They catch structural failures — runaway agents, infinite loops, and detect successful upvotes via shared state:

Agent Loop

src/agent_loop.rs is the orchestration engine:

loop:
  1. trim context (if over MAX_CONTEXT_MESSAGES)
  2. check guardrails → if Stop, return immediately
  3. call model with current messages + tools
  4. if model says "stop": return answer
  5. if model calls tools:
       for each tool call:
         execute the tool
         capture ToolEvent in trace
         append result to messages
  6. run login handler (if page is /login or /vote, auto-fill credentials
     and inject a "harness_auto_login" event + user message)
  7. log iteration to trace with all tool events
  8. loop back to step 1

The loop tracks a full Vec<LoopIteration> trace, where each iteration contains Vec<ToolEvent> with the tool name, arguments, and result. This trace is used by harness.rs for verification and structured output.

The loop sends messages in native OpenAI format directly to Ollama’s /v1/chat/completions endpoint. No message conversion layer needed.

Login Handler

src/login_handler.rs provides create_login_handler(session), which returns a closure that runs after every batch of tool calls. It checks the current URL:

  • If the URL contains /login or /vote (HN redirects unauthenticated upvote attempts to the login page), it auto-fills the credentials and submits the form via input[name='acct'], input[name='pw'], and input[type='submit'].

When triggered, it returns a ToolEvent { tool: "harness_auto_login", ... } that the loop injects into the trace and appends a user message telling the model it’s now authenticated and should navigate back to HN.

The Harness Lifecycle

The harness (src/harness.rs) owns the full lifecycle with retry logic. This is the architectural decision that makes it a real harness rather than just a loop with tools:

run_harness()
  ├── create shared UpvotedStory state    ← tools write, guardrails read
  ├── create default_guardrails(state)    ← includes stop_after_upvote
  │
  ├── attempt 1..MAX_ATTEMPTS:
  │   ├── BrowserSession::open()           ← harness opens the environment
  │   ├── create_tools(session, state)     ← tools bound to session + state
  │   ├── create_login_handler(session)    ← auto-login on redirect
  │   ├── create_context(TASK)             ← fresh context for this task
  │   ├── run_loop(guardrails, login)      ← loop runs inside the environment
  │   ├── verify_successful_upvote(result) ← check trace for up_ click
  │   ├── if verified: return success
  │   └── [Browser closed on Drop]         ← always, via RAII
  │
  └── return result with verification

The harness opens the browser, creates tools bound to that browser page and shared state, creates the login handler and guardrails, runs the loop with retry logic, verifies the outcome via trace inspection, and cleans up via Rust’s RAII drop semantics when BrowserSession is dropped.

Verification

verify_successful_upvote inspects the trace for an upvote click, then uses the live browser session to check the page DOM. HN removes the upvote arrow element entirely after a successful vote, so an element-not-found error (or the presence of the nosee class) confirms the vote was registered by HN’s servers. The harness retries up to 3 times if verification fails, reusing the same shared upvote state across attempts.

Browser Environment

src/browser.rs provides the BrowserSession struct — a thin wrapper around headless_chrome:

#![allow(unused)]
fn main() {
impl BrowserSession {
    pub fn open() -> Result<Self>;
    pub fn navigate(&self, url: &str) -> Result<String>;
    pub fn get_url(&self) -> Result<String>;
    pub fn get_text(&self) -> Result<String>;
    pub fn fill(&self, selector: &str, value: &str) -> Result<String>;
    pub fn click(&self, selector: &str) -> Result<String>;
    pub fn get_stories(&self) -> Result<String>;          // HN-specific
    pub fn has_class(&self, selector: &str, class_name: &str) -> Result<String>;
}
}

Each harness run gets one isolated browser page via Chrome’s DevTools Protocol. When the run ends — whether it succeeded, failed, or threw — the browser closes via Rust’s RAII drop semantics.


  1. “Harnesses in AI: A Deep Dive” — Tejas Kumar, AI Engineer World’s Fair, May 2026