AI Workflow Audit Plan
- Source
- Dru Knox, Stop Prompting, Start Engineering: The “Context as Code” Shift, YouTube, February 25, 2026
This document captures actionable infrastructure changes to improve how cerebro manages AI workflows. The goal is to treat AI context and session behavior as observable, measurable systems, not as opaque chat sessions.
Mindset: From IC to Steward
When using AI agents, the human role shifts from individual contributor to tech lead. The job is no longer writing all the code directly; it is ensuring good code can be written by maintaining standards, documentation, and quality gates.
This shift is noted here but requires no tool changes. It is a framing device for the infrastructure decisions that follow.
Problem: Non-Deterministic Tooling Waste
AI agents are non-deterministic. They repeat tasks, thrash between approaches, make unnecessary tool calls, and burn tokens on false paths. Without telemetry, this waste is invisible. We need to instrument the agent’s behavior so we can detect and eliminate waste programmatically, not by watching every session.
Audit Infrastructure
1. Session Log Analysis Scripts
OpenCode stores session logs locally in SQLite (~/.local/share/opencode/opencode.db). We
already read this database for dashboard rendering. We should add analysis scripts that
mine session history for waste patterns.
Waste Patterns to Detect
| Pattern | Description | Signal |
|---|---|---|
| Thrashing | Agent switches approaches repeatedly without progress | Same tool called >3x with similar args in one session |
| Retry loops | Agent retries after failures without changing strategy | cargo check fails, agent reruns same command |
| Over-tooling | Agent uses tools when direct reasoning suffices | File read followed by immediate file read of same content |
| Apology tax | Agent apologizes and backtracks | Phrases like “sorry”, “you’re absolutely right” in assistant messages |
| Token bloat | Context window fills with redundant planning | Repeated restatement of plan without new information |
Proposed Script: cerebro audit sessions
A new CLI subcommand (or standalone script in scripts/) that:
- Queries the opencode database for recent sessions
- Parses message history for waste patterns
- Outputs a summary report with:
- Sessions analyzed
- Waste events detected (by category)
- Estimated token cost of waste
- Suggested configuration changes
Example output:
Sessions analyzed: 12
Thrashing events: 3 (25% of sessions)
Retry loops: 7 (58% of sessions)
Over-tooling: 2 (17% of sessions)
Top suggestion: Add `cargo check` pre-validation to agent context
Affected sessions: 5
Waste pattern: Agent runs `cargo check`, sees error, proposes fix,
runs `cargo check` again without applying fix first.
2. Static Analysis for Hallucination Prevention
As we allow agents to perform more divergent and inductive reasoning, the risk of hallucinated APIs, incorrect types, and phantom dependencies increases. Static analysis is the first line of defense: it catches nonsense before it compiles, before it runs, before it gets committed.
We already enforce clippy::pedantic at the workspace level. This section proposes expanding
that enforcement specifically for AI-generated code paths.
Current State
The project already forbids unwrap(), panic!(), and expect() in production code. This
is an excellent baseline. We should extend this philosophy.
Proposed Additions
| Check | Purpose | How |
|---|---|---|
| Import validation | Prevent phantom crate references | cargo check in CI must pass before any PR |
| Type strictness | Prevent inferred-type hallucinations | Enable clippy::pedantic lints that reject implicit conversions |
| Documentation coverage | Force agents to document public APIs | cargo doc --no-deps must pass without warnings |
| Dead code detection | Catch abandoned experiments | clippy::dead_code as deny, not warn |
| TODO/FIXME enforcement | Prevent permanent placeholder code | Already exists; ensure it runs on AI-generated commits too |
Pre-Flight Hook for Agent Sessions
Consider a lightweight cargo alias or script that agents run before submitting:
# In .cargo/config.toml or agent context
[alias]
preflight = "fmt --check && clippy --workspace -- -D warnings && check"
Agents should be instructed to run cargo preflight before declaring a task complete. This
catches hallucinations at the agent’s workstation, not in CI.
3. Configuration to Prevent Thrashing
OpenCode and similar agents accept configuration that shapes their behavior. We should
codify anti-thrashing settings in the project’s AGENTS.md and explore opencode
configuration options.
Token Budget Awareness
| Setting | Suggestion | Rationale |
|---|---|---|
| Context window limit | Explicitly state the project’s preferred max | Prevents agents from stuffing irrelevant files into context |
| Tool call budget | Configure max sequential tool calls | Forces agent to reason rather than search |
| Retry limit | Cap retries at 2 with forced pause | Prevents infinite retry loops on flaky operations |
Context Hygiene Rules
Add to AGENTS.md:
## Agent Efficiency Rules
1. **Run `cargo preflight` before every completion claim.**
2. **If a command fails twice with the same error, stop and ask.**
3. **Do not read a file you already read in this session unless it changed.**
4. **Before using a new crate or API, verify it exists in `Cargo.lock`.**
5. **Keep context focused: only include files relevant to the current task.**
4. CI/CD Integration
We already use Woodpecker CI. The audit plan should extend CI to catch AI workflow regressions.
Proposed CI Checks
| Stage | Check | Trigger |
|---|---|---|
| Lint | cargo clippy --workspace -- -D warnings | Every push |
| Format | cargo fmt --check | Every push |
| Doc | cargo doc --no-deps | Every push |
| Test | cargo test --workspace | Every push |
| Audit | Run session waste analyzer on last 7 days | Weekly cron (or manual) |
The weekly audit stage is lightweight: it queries the local opencode database (or a centralized copy) and posts a summary comment on the PR if waste patterns spike.
5. Plugin and Linter Integration
Rather than building custom LLM-as-judge evals, we should leverage existing tools and explore opencode plugins that reduce waste.
Tool Inventory
| Tool | Current Use | AI Workflow Enhancement |
|---|---|---|
clippy | Lint enforcement | Add lints that catch common AI mistakes (e.g., unused imports from hallucinated code) |
cargo-deny | License checking | Could extend to detect unexpected crate additions |
lefthook | Pre-commit hooks | Add hook that warns if AGENTS.md context is stale |
dprint | Markdown formatting | Ensure AGENTS.md and context files stay parseable |
Opencode Plugin Opportunities
The opencode ecosystem may support plugins that:
- Intercept tool calls and cache file reads for the session duration
- Enforce a “budget” of tool calls per task
- Auto-run
cargo preflightbefore allowing the agent to report success - Summarize session waste and append it to the cortex journal
These are research items. This plan flags them for investigation.
Measurement: What We Track
We cannot improve what we do not measure. The audit infrastructure should collect:
| Metric | Source | Target |
|---|---|---|
| Session duration | opencode.db | Reduce median by 20% |
| Tool calls per session | opencode.db | Reduce median by 30% |
cargo check failures before success | opencode.db + git log | Reduce retry rate below 10% |
| Clippy warnings on AI-generated commits | git diff + cargo clippy | Zero warnings |
| Agent backtracks (“sorry”, “you’re right”) | opencode.db message text | Reduce by 50% |
Immediate Actions
- Write the session analyzer script. A standalone Rust binary or Python script that
reads
opencode.dband outputs waste statistics. - Update
AGENTS.mdwith agent efficiency rules. Codify the anti-thrashing guidelines. - Add
cargo preflightalias. Make it trivial for agents to self-check. - Schedule weekly CI audit job. Run the analyzer and surface trends.
- Research opencode plugin API. Determine if custom plugins can enforce tool budgets.
Deferred Actions
- Evals / statistical testing: Not pursued. The cost of LLM-as-judge evals outweighs the benefit for this project. We rely on deterministic static analysis instead.
- Automated context updates via PR scanning: Interesting but requires a stable context
format first. Revisit after
AGENTS.mdstructure matures.
References
- Dru Knox, “Stop Prompting, Start Engineering: The ‘Context as Code’ Shift,” YouTube, February 25, 2026
- Cerebro AGENTS.md: Current agent guidelines
- ADR 0005: MCP server for OpenCode: Existing OpenCode integration