Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Botface Architecture

Project: Botface - Rust Voice Assistant for Batocera/Raspberry Pi Status: Active Development Last Updated: March 2026


System Overview

Botface is a voice-controlled AI assistant that runs on Raspberry Pi with Batocera Linux. It provides hands-free interaction through wake word detection, speech recognition, AI language model integration, and text-to-speech responses.


Core Components

1. Audio Subsystem (audio/)

Purpose: Capture microphone input and playback responses Pattern: Graybox - simple AudioCapture interface, complex ALSA implementation hidden

Interface:

  • AudioCapture::new() - Configure capture
  • start_continuous() - Stream audio chunks
  • ContinuousHandle - Stop recording

Hardware:

  • Raspberry Pi: ALSA via arecord/aplay subprocesses
  • Local dev: Any audio device (macOS compatible)

2. Wake Word Detection (wakeword/)

Purpose: Detect “Hey Jarvis” wake phrase Pattern: Graybox - WakeWordDetector struct, ONNX inference hidden

Interface:

  • WakeWordDetector::new() - Load ONNX model
  • predict() - Check audio chunk for wake word
  • reset() - Clear buffer after detection

Implementation:

  • ONNX Runtime for inference
  • Resampling: 48kHz → 16kHz via rubato
  • Prediction buffer accumulation (not immediate results)

3. Speech-to-Text (stt/)

Purpose: Convert speech audio to text Pattern: Graybox - SttEngine interface, whisper.cpp hidden

Interface:

  • SttEngine::new() - Initialize with model
  • transcribe() - Audio → Text
  • supported_languages() - Query capabilities

Implementation:

  • whisper.cpp subprocess (local, no cloud)
  • WAV input file → text output
  • Language auto-detection

4. Language Model (llm/)

Purpose: Generate AI responses to user queries Pattern: Graybox - LlmClient interface, Ollama API hidden

Interface:

  • LlmClient::new() - Configure endpoint
  • chat() - Send message, get response
  • with_memory() - Enable conversation history
  • with_search() - Enable web search

Implementation:

  • HTTP client to local Ollama server
  • No API keys required (self-hosted)
  • Optional: conversation memory, web search

5. Text-to-Speech (tts/)

Purpose: Convert text responses to speech Pattern: Graybox - TtsEngine interface, Piper hidden

Interface:

  • TtsEngine::new() - Load voice model
  • speak() - Text → Audio (PCM samples)
  • is_speaking() / stop() - Control playback

Implementation:

  • Piper TTS (fast, local neural TTS)
  • WAV output converted to PCM
  • Voice model caching

6. Sound Effects (sounds/)

Purpose: Audio feedback for state transitions Pattern: Graybox - already clean interface

Interface:

  • SoundPlayer::new() - Configure directories
  • play_greeting() - Startup sound
  • play_ack() - Wake word detected
  • play_thinking() - Processing
  • play_error() - Something went wrong

Implementation:

  • Random selection from category directories
  • WAV files played via aplay
  • Can be disabled

7. GPIO Control (gpio/)

Purpose: Hardware feedback (LED, button) Pattern: Trait-based abstraction - Gpio trait

Interface:

  • Gpio::led_on() / led_off() - Visual feedback
  • Gpio::is_button_pressed() - Physical input
  • AiyHatMock - Test without hardware

Implementation:

  • Real: gpioset/gpioget via AIY Voice HAT
  • Mock: Console output only

8. State Machine (state_machine.rs)

Purpose: Orchestrate the conversation flow Pattern: Single file, clean state transitions

States:

Idle → Listening → Recording → Transcribing → Thinking → Speaking → Idle

Key Features:

  • Async/await throughout
  • Non-blocking I/O
  • Error recovery (transitions to Error state)
  • Activation counter (statistics)

Data Flow

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│   Audio In  │────▶│  Wake Word   │────▶│  Recording  │
│ (Microphone)│     │  Detection   │     │   (STT)     │
└─────────────┘     └──────────────┘     └──────┬──────┘
                                                 │
                                                 ▼
┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│  Audio Out  │◀────│     TTS      │◀────│    LLM      │
│  (Speaker)  │     │ (Response)   │     │ (Thinking)  │
└─────────────┘     └──────────────┘     └─────────────┘

Flow:

  1. Continuous audio capture
  2. Wake word detection (“Hey Jarvis”)
  3. Recording user command
  4. STT transcription
  5. LLM generates response
  6. TTS synthesizes speech
  7. Audio playback

Configuration

File: config.toml (TOML format)

Sections:

  • [audio] - Sample rate, device, format
  • [wakeword] - Model path, threshold
  • [stt] - Whisper binary, model, language
  • [llm] - Ollama URL, model, system prompt
  • [tts] - Piper binary, voice model
  • [gpio] - Pin numbers, mock mode
  • [sounds] - Sound directories, enabled
  • [dev_mode] - Local testing flags

Environment-specific:

  • Pi/Batocera: Uses hardware pins, ALSA
  • Local dev: Mock GPIO, any audio device

Testing Strategy

Unit Tests

  • Each module: tests/<module>_tests.rs
  • Mock implementations for hardware
  • Behavior locked down for safe refactoring

Integration Tests

  • tests/integration_test.rs - Module interactions
  • tests/automated_integration_tests.rs - Full pipeline (synthetic audio)

Architecture Tests

  • tests/architecture_test.rs - Enforce conventions
  • Deep module validation (<10 public items)
  • Documentation requirements

Technology Stack

ComponentTechnologyWhy
LanguageRustSafety, performance, async
Async RuntimeTokioNon-blocking I/O
AudioALSA (arecord/aplay)Pi compatibility
Wake WordONNX RuntimeFast inference
STTwhisper.cppLocal, accurate
LLMOllamaSelf-hosted, no API keys
TTSPiperFast, neural, local
GPIOLinux sysfsHardware control

Design Principles

1. Deep Modules

Every module has simple interface hiding complex implementation

  • Example: WakeWordDetector (3 methods) vs 156 lines of ONNX/resampling code
  • Pattern: Public interface in mod.rs, implementation in imp/

2. Platform Abstraction

Works on Mac (dev) and Pi (prod) without changes

  • GPIO trait with real/mock implementations
  • Audio device configurable
  • Mock mode for all hardware

3. Fail Fast

Validation at startup, not runtime

  • Config validation on load
  • Hardware checks before main loop
  • Clear error messages

4. Observable

Structured logging at all transitions

  • tracing for structured logs
  • State machine transitions logged
  • Performance metrics

5. Privacy-First

No cloud dependencies for core functionality

  • All AI runs locally (Ollama, whisper.cpp, Piper)
  • No audio sent to external services
  • Optional: web search (user choice)

Module Dependencies

state_machine/
  ├── audio/
  ├── wakeword/
  ├── stt/
  ├── llm/
  ├── tts/
  ├── sounds/
  └── gpio/

Dependency Rules:

  • State machine coordinates all modules
  • Modules don’t depend on each other directly
  • All use config for shared settings
  • Clean separation allows mocking in tests

Production Deployment Architecture

For production deployment on Batocera/Raspberry Pi, Botface uses a sidecar pattern with openWakeWord running as an independent HTTP service.

What is the Sidecar Pattern?

The sidecar pattern is a architectural pattern where a secondary process (the “sidecar”) runs alongside a main application to provide supporting functionality. The sidecar shares the same lifecycle as the main application but operates in a separate process, communicating via lightweight protocols like HTTP or gRPC.

Formal Definition: Microsoft Azure Architecture - Sidecar Pattern

“Deploy components of an application into a separate process or container to provide isolation and encapsulation.”

Alternative References (non-vendor specific):

Key Characteristics:

  • Co-located: Sidecar runs on the same host as the main application
  • Separate Process: Isolated failure domain (if sidecar crashes, main app continues)
  • Shared Resources: Can access same filesystem, network, and devices
  • Language Agnostic: Main app and sidecar can use different languages/runtimes
  • Independent Lifecycle: Can be updated, restarted, or scaled independently

Why Sidecar for Botface?

We chose the sidecar pattern for wake word detection for three critical reasons:

1. Language Ecosystem Isolation

Wake word detection requires ONNX model inference and real-time audio processing. The Rust ecosystem for these tasks is limited compared to Python:

CapabilityPythonRust
ONNX Runtime✅ Mature, optimized⚠️ Basic bindings
openWakeWord✅ Battle-tested❌ Not available
Audio (sounddevice)✅ Callback-based⚠️ ALSA only
NumPy/SciPy signal processing✅ Native❌ Limited

Python’s mature ML/audio ecosystem provides better performance and reliability for wake word detection.

2. Audio Device Ownership

The sidecar owns all audio I/O (microphone access), providing:

  • Single point of control: One process manages the audio hardware
  • Buffer management: Python’s sounddevice library handles real-time audio callbacks efficiently
  • Isolation: Audio driver issues don’t crash the main Rust application
  • Device flexibility: Easy to swap audio backends (ALSA, PulseAudio, etc.)

3. Fault Isolation

If the wake word detector encounters issues (model loading, memory pressure, audio errors), the main Botface application continues running:

  • Graceful degradation: Botface falls back to button-based activation if sidecar unavailable
  • Independent restart: Can restart sidecar without stopping Botface
  • Simpler debugging: Separate logs for audio/wake-word vs. application logic
graph TB
    subgraph "Process Management"
        PM[botface-manager.sh<br/>or systemd]
    end

    subgraph "Wake Word Detection"
        WW[openWakeWord<br/>Python HTTP Service<br/>Port 8080]
        WW_API["/health - Health check"]
        WW_API2["/events - SSE stream"]
        WW_API3["/reset - Reset state"]
    end

    subgraph "Main Application"
        BF[Botface<br/>Rust Binary]
        SM[State Machine]
        STT[Speech-to-Text<br/>whisper.cpp]
        LLM[LLM Client<br/>Ollama]
        TTS[Text-to-Speech<br/>Piper]
    end

    subgraph "Shared Resources"
        LOGS[(Log Files<br/>/userdata/voice-assistant/logs/)]
        MODELS[(Models<br/>ONNX/GGML)]
    end

    PM -->|Manages| WW
    PM -->|Manages| BF

    WW -->|SSE Events| BF
    BF -->|HTTP POST| WW

    BF --> SM
    SM --> STT
    SM --> LLM
    SM --> TTS

    WW -.->|Logs| LOGS
    BF -.->|Logs| LOGS
    WW -.->|Loads| MODELS
    BF -.->|Uses| MODELS

Deployment Flow

  1. Process Manager (botface-manager.sh or systemd) starts both services
  2. openWakeWord starts first and exposes HTTP API on port 8080
  3. Botface connects to openWakeWord via HTTP/SSE
  4. Wake word events stream from Python to Rust via Server-Sent Events
  5. Both services write logs to shared log directory

Service Management

# Start both services
/userdata/voice-assistant/botface-manager.sh start

# Check status
/userdata/voice-assistant/botface-manager.sh status

# View logs
/userdata/voice-assistant/botface-manager.sh logs

# Stop
/userdata/voice-assistant/botface-manager.sh stop

Why Sidecar Pattern?

  • Language isolation - Python crashes don’t bring down Rust app
  • Independent updates - Update wake word model without touching main app
  • Health monitoring - Each service can be monitored independently
  • Resource management - Separate resource limits for each component

Future Enhancements

Near-term

  • Streaming STT (process audio while user still speaking)
  • Multi-turn conversations (context memory)
  • Voice activity detection (VAD)
  • Better error recovery

Long-term

  • Multiple wake words
  • Speaker recognition (who is speaking)
  • Custom voice models
  • Tool calling (control smart home, etc.)

Graybox Pattern Application

All modules follow Matt Pocock’s deep module pattern:

wakeword/
├── mod.rs          # Public: 3 methods
└── imp/
    └── mod.rs      # Private: 156 lines implementation

Benefits:

  • AI navigates codebase in seconds
  • Tests lock behavior (safe refactoring)
  • Clear entry points
  • Progressive disclosure

See Also

  • AGENTS.md - Coding guidelines for AI assistants
  • context/v1.0/PATTERNS.md - Agentic workflow patterns
  • docs/ai-readiness.md - Architecture improvements
  • docs/codebase-audit.md - Comparison to best practices

Architecture version: 1.0 Pocock Score: 10/10 (deep modules throughout) Tests: 86 passing (unit + integration + architecture)

Module: vision

Location: src/vision/

Description: [Auto-detected module - please add description]

Public Interface: [Please document public API]

Dependencies: [Please list dependencies]

AI Context:

  • [Add guidance for AI working with this module]
  • [Document common modification tasks]
  • [Note testing requirements]