AI Workflow Audit Plan

Source: Dru Knox, Stop Prompting, Start Engineering: The “Context as Code” Shift, YouTube, February 25, 2026

This document captures actionable infrastructure changes to improve how cerebro manages AI workflows. The goal is to treat AI context and session behavior as observable, measurable systems, not as opaque chat sessions.

Mindset: From IC to Steward

When using AI agents, the human role shifts from individual contributor to tech lead. The job is no longer writing all the code directly; it is ensuring good code can be written by maintaining standards, documentation, and quality gates.

This shift is noted here but requires no tool changes. It is a framing device for the infrastructure decisions that follow.

Problem: Non-Deterministic Tooling Waste

AI agents are non-deterministic. They repeat tasks, thrash between approaches, make unnecessary tool calls, and burn tokens on false paths. Without telemetry, this waste is invisible. We need to instrument the agent’s behavior so we can detect and eliminate waste programmatically, not by watching every session.

Audit Infrastructure

1. Session Log Analysis Scripts

OpenCode stores session logs locally in SQLite (~/.local/share/opencode/opencode.db). We already read this database for dashboard rendering. We should add analysis scripts that mine session history for waste patterns.

Waste Patterns to Detect

Pattern	Description	Signal
Thrashing	Agent switches approaches repeatedly without progress	Same tool called >3x with similar args in one session
Retry loops	Agent retries after failures without changing strategy	`cargo check` fails, agent reruns same command
Over-tooling	Agent uses tools when direct reasoning suffices	File read followed by immediate file read of same content
Apology tax	Agent apologizes and backtracks	Phrases like “sorry”, “you’re absolutely right” in assistant messages
Token bloat	Context window fills with redundant planning	Repeated restatement of plan without new information

Proposed Script: `cerebro audit sessions`

A new CLI subcommand (or standalone script in scripts/) that:

Queries the opencode database for recent sessions
Parses message history for waste patterns
Outputs a summary report with:
- Sessions analyzed
- Waste events detected (by category)
- Estimated token cost of waste
- Suggested configuration changes

Example output:

Sessions analyzed: 12
Thrashing events: 3 (25% of sessions)
Retry loops: 7 (58% of sessions)
Over-tooling: 2 (17% of sessions)

Top suggestion: Add `cargo check` pre-validation to agent context
  Affected sessions: 5
  Waste pattern: Agent runs `cargo check`, sees error, proposes fix,
                 runs `cargo check` again without applying fix first.

2. Static Analysis for Hallucination Prevention

As we allow agents to perform more divergent and inductive reasoning, the risk of hallucinated APIs, incorrect types, and phantom dependencies increases. Static analysis is the first line of defense: it catches nonsense before it compiles, before it runs, before it gets committed.

We already enforce clippy::pedantic at the workspace level. This section proposes expanding that enforcement specifically for AI-generated code paths.

Current State

The project already forbids unwrap(), panic!(), and expect() in production code. This is an excellent baseline. We should extend this philosophy.

Proposed Additions

Check	Purpose	How
Import validation	Prevent phantom crate references	`cargo check` in CI must pass before any PR
Type strictness	Prevent inferred-type hallucinations	Enable `clippy::pedantic` lints that reject implicit conversions
Documentation coverage	Force agents to document public APIs	`cargo doc --no-deps` must pass without warnings
Dead code detection	Catch abandoned experiments	`clippy::dead_code` as deny, not warn
TODO/FIXME enforcement	Prevent permanent placeholder code	Already exists; ensure it runs on AI-generated commits too

Pre-Flight Hook for Agent Sessions

Consider a lightweight cargo alias or script that agents run before submitting:

# In .cargo/config.toml or agent context
[alias]
preflight = "fmt --check && clippy --workspace -- -D warnings && check"

Agents should be instructed to run cargo preflight before declaring a task complete. This catches hallucinations at the agent’s workstation, not in CI.

3. Configuration to Prevent Thrashing

OpenCode and similar agents accept configuration that shapes their behavior. We should codify anti-thrashing settings in the project’s AGENTS.md and explore opencode configuration options.

Token Budget Awareness

Setting	Suggestion	Rationale
Context window limit	Explicitly state the project’s preferred max	Prevents agents from stuffing irrelevant files into context
Tool call budget	Configure max sequential tool calls	Forces agent to reason rather than search
Retry limit	Cap retries at 2 with forced pause	Prevents infinite retry loops on flaky operations

Context Hygiene Rules

Add to AGENTS.md:

## Agent Efficiency Rules

1. **Run `cargo preflight` before every completion claim.**
2. **If a command fails twice with the same error, stop and ask.**
3. **Do not read a file you already read in this session unless it changed.**
4. **Before using a new crate or API, verify it exists in `Cargo.lock`.**
5. **Keep context focused: only include files relevant to the current task.**

4. CI/CD Integration

We already use Woodpecker CI. The audit plan should extend CI to catch AI workflow regressions.

Proposed CI Checks

Stage	Check	Trigger
Lint	`cargo clippy --workspace -- -D warnings`	Every push
Format	`cargo fmt --check`	Every push
Doc	`cargo doc --no-deps`	Every push
Test	`cargo test --workspace`	Every push
Audit	Run session waste analyzer on last 7 days	Weekly cron (or manual)

The weekly audit stage is lightweight: it queries the local opencode database (or a centralized copy) and posts a summary comment on the PR if waste patterns spike.

5. Plugin and Linter Integration

Rather than building custom LLM-as-judge evals, we should leverage existing tools and explore opencode plugins that reduce waste.

Tool Inventory

Tool	Current Use	AI Workflow Enhancement
`clippy`	Lint enforcement	Add lints that catch common AI mistakes (e.g., unused imports from hallucinated code)
`cargo-deny`	License checking	Could extend to detect unexpected crate additions
`lefthook`	Pre-commit hooks	Add hook that warns if `AGENTS.md` context is stale
`dprint`	Markdown formatting	Ensure `AGENTS.md` and context files stay parseable

Opencode Plugin Opportunities

The opencode ecosystem may support plugins that:

Intercept tool calls and cache file reads for the session duration
Enforce a “budget” of tool calls per task
Auto-run cargo preflight before allowing the agent to report success
Summarize session waste and append it to the cortex journal

These are research items. This plan flags them for investigation.

Measurement: What We Track

We cannot improve what we do not measure. The audit infrastructure should collect:

Metric	Source	Target
Session duration	opencode.db	Reduce median by 20%
Tool calls per session	opencode.db	Reduce median by 30%
`cargo check` failures before success	opencode.db + git log	Reduce retry rate below 10%
Clippy warnings on AI-generated commits	git diff + `cargo clippy`	Zero warnings
Agent backtracks (“sorry”, “you’re right”)	opencode.db message text	Reduce by 50%

Immediate Actions

Write the session analyzer script. A standalone Rust binary or Python script that reads opencode.db and outputs waste statistics.
Update AGENTS.md with agent efficiency rules. Codify the anti-thrashing guidelines.
Add cargo preflight alias. Make it trivial for agents to self-check.
Schedule weekly CI audit job. Run the analyzer and surface trends.
Research opencode plugin API. Determine if custom plugins can enforce tool budgets.

Deferred Actions

Evals / statistical testing: Not pursued. The cost of LLM-as-judge evals outweighs the benefit for this project. We rely on deterministic static analysis instead.
Automated context updates via PR scanning: Interesting but requires a stable context format first. Revisit after AGENTS.md structure matures.

References

Dru Knox, “Stop Prompting, Start Engineering: The ‘Context as Code’ Shift,” YouTube, February 25, 2026
Cerebro AGENTS.md: Current agent guidelines
ADR 0005: MCP server for OpenCode: Existing OpenCode integration

Keyboard shortcuts

Cerebro