Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

AI Workflow Audit Plan

Source
Dru Knox, Stop Prompting, Start Engineering: The “Context as Code” Shift, YouTube, February 25, 2026

This document captures actionable infrastructure changes to improve how cerebro manages AI workflows. The goal is to treat AI context and session behavior as observable, measurable systems, not as opaque chat sessions.

Mindset: From IC to Steward

When using AI agents, the human role shifts from individual contributor to tech lead. The job is no longer writing all the code directly; it is ensuring good code can be written by maintaining standards, documentation, and quality gates.

This shift is noted here but requires no tool changes. It is a framing device for the infrastructure decisions that follow.

Problem: Non-Deterministic Tooling Waste

AI agents are non-deterministic. They repeat tasks, thrash between approaches, make unnecessary tool calls, and burn tokens on false paths. Without telemetry, this waste is invisible. We need to instrument the agent’s behavior so we can detect and eliminate waste programmatically, not by watching every session.

Audit Infrastructure

1. Session Log Analysis Scripts

OpenCode stores session logs locally in SQLite (~/.local/share/opencode/opencode.db). We already read this database for dashboard rendering. We should add analysis scripts that mine session history for waste patterns.

Waste Patterns to Detect

PatternDescriptionSignal
ThrashingAgent switches approaches repeatedly without progressSame tool called >3x with similar args in one session
Retry loopsAgent retries after failures without changing strategycargo check fails, agent reruns same command
Over-toolingAgent uses tools when direct reasoning sufficesFile read followed by immediate file read of same content
Apology taxAgent apologizes and backtracksPhrases like “sorry”, “you’re absolutely right” in assistant messages
Token bloatContext window fills with redundant planningRepeated restatement of plan without new information

Proposed Script: cerebro audit sessions

A new CLI subcommand (or standalone script in scripts/) that:

  1. Queries the opencode database for recent sessions
  2. Parses message history for waste patterns
  3. Outputs a summary report with:
    • Sessions analyzed
    • Waste events detected (by category)
    • Estimated token cost of waste
    • Suggested configuration changes

Example output:

Sessions analyzed: 12
Thrashing events: 3 (25% of sessions)
Retry loops: 7 (58% of sessions)
Over-tooling: 2 (17% of sessions)

Top suggestion: Add `cargo check` pre-validation to agent context
  Affected sessions: 5
  Waste pattern: Agent runs `cargo check`, sees error, proposes fix,
                 runs `cargo check` again without applying fix first.

2. Static Analysis for Hallucination Prevention

As we allow agents to perform more divergent and inductive reasoning, the risk of hallucinated APIs, incorrect types, and phantom dependencies increases. Static analysis is the first line of defense: it catches nonsense before it compiles, before it runs, before it gets committed.

We already enforce clippy::pedantic at the workspace level. This section proposes expanding that enforcement specifically for AI-generated code paths.

Current State

The project already forbids unwrap(), panic!(), and expect() in production code. This is an excellent baseline. We should extend this philosophy.

Proposed Additions

CheckPurposeHow
Import validationPrevent phantom crate referencescargo check in CI must pass before any PR
Type strictnessPrevent inferred-type hallucinationsEnable clippy::pedantic lints that reject implicit conversions
Documentation coverageForce agents to document public APIscargo doc --no-deps must pass without warnings
Dead code detectionCatch abandoned experimentsclippy::dead_code as deny, not warn
TODO/FIXME enforcementPrevent permanent placeholder codeAlready exists; ensure it runs on AI-generated commits too

Pre-Flight Hook for Agent Sessions

Consider a lightweight cargo alias or script that agents run before submitting:

# In .cargo/config.toml or agent context
[alias]
preflight = "fmt --check && clippy --workspace -- -D warnings && check"

Agents should be instructed to run cargo preflight before declaring a task complete. This catches hallucinations at the agent’s workstation, not in CI.

3. Configuration to Prevent Thrashing

OpenCode and similar agents accept configuration that shapes their behavior. We should codify anti-thrashing settings in the project’s AGENTS.md and explore opencode configuration options.

Token Budget Awareness

SettingSuggestionRationale
Context window limitExplicitly state the project’s preferred maxPrevents agents from stuffing irrelevant files into context
Tool call budgetConfigure max sequential tool callsForces agent to reason rather than search
Retry limitCap retries at 2 with forced pausePrevents infinite retry loops on flaky operations

Context Hygiene Rules

Add to AGENTS.md:

## Agent Efficiency Rules

1. **Run `cargo preflight` before every completion claim.**
2. **If a command fails twice with the same error, stop and ask.**
3. **Do not read a file you already read in this session unless it changed.**
4. **Before using a new crate or API, verify it exists in `Cargo.lock`.**
5. **Keep context focused: only include files relevant to the current task.**

4. CI/CD Integration

We already use Woodpecker CI. The audit plan should extend CI to catch AI workflow regressions.

Proposed CI Checks

StageCheckTrigger
Lintcargo clippy --workspace -- -D warningsEvery push
Formatcargo fmt --checkEvery push
Doccargo doc --no-depsEvery push
Testcargo test --workspaceEvery push
AuditRun session waste analyzer on last 7 daysWeekly cron (or manual)

The weekly audit stage is lightweight: it queries the local opencode database (or a centralized copy) and posts a summary comment on the PR if waste patterns spike.

5. Plugin and Linter Integration

Rather than building custom LLM-as-judge evals, we should leverage existing tools and explore opencode plugins that reduce waste.

Tool Inventory

ToolCurrent UseAI Workflow Enhancement
clippyLint enforcementAdd lints that catch common AI mistakes (e.g., unused imports from hallucinated code)
cargo-denyLicense checkingCould extend to detect unexpected crate additions
lefthookPre-commit hooksAdd hook that warns if AGENTS.md context is stale
dprintMarkdown formattingEnsure AGENTS.md and context files stay parseable

Opencode Plugin Opportunities

The opencode ecosystem may support plugins that:

  • Intercept tool calls and cache file reads for the session duration
  • Enforce a “budget” of tool calls per task
  • Auto-run cargo preflight before allowing the agent to report success
  • Summarize session waste and append it to the cortex journal

These are research items. This plan flags them for investigation.

Measurement: What We Track

We cannot improve what we do not measure. The audit infrastructure should collect:

MetricSourceTarget
Session durationopencode.dbReduce median by 20%
Tool calls per sessionopencode.dbReduce median by 30%
cargo check failures before successopencode.db + git logReduce retry rate below 10%
Clippy warnings on AI-generated commitsgit diff + cargo clippyZero warnings
Agent backtracks (“sorry”, “you’re right”)opencode.db message textReduce by 50%

Immediate Actions

  1. Write the session analyzer script. A standalone Rust binary or Python script that reads opencode.db and outputs waste statistics.
  2. Update AGENTS.md with agent efficiency rules. Codify the anti-thrashing guidelines.
  3. Add cargo preflight alias. Make it trivial for agents to self-check.
  4. Schedule weekly CI audit job. Run the analyzer and surface trends.
  5. Research opencode plugin API. Determine if custom plugins can enforce tool budgets.

Deferred Actions

  • Evals / statistical testing: Not pursued. The cost of LLM-as-judge evals outweighs the benefit for this project. We rely on deterministic static analysis instead.
  • Automated context updates via PR scanning: Interesting but requires a stable context format first. Revisit after AGENTS.md structure matures.

References