Botface Architecture

Project: Botface - Rust Voice Assistant for Batocera/Raspberry Pi Status: Active Development Last Updated: March 2026

System Overview

Botface is a voice-controlled AI assistant that runs on Raspberry Pi with Batocera Linux. It provides hands-free interaction through wake word detection, speech recognition, AI language model integration, and text-to-speech responses.

Core Components

1. Audio Subsystem (`audio/`)

Purpose: Capture microphone input and playback responses Pattern: Graybox - simple AudioCapture interface, complex ALSA implementation hidden

Interface:

AudioCapture::new() - Configure capture
start_continuous() - Stream audio chunks
ContinuousHandle - Stop recording

Hardware:

Raspberry Pi: ALSA via arecord/aplay subprocesses
Local dev: Any audio device (macOS compatible)

2. Wake Word Detection (`wakeword/`)

Purpose: Detect “Hey Jarvis” wake phrase Pattern: Graybox - WakeWordDetector struct, ONNX inference hidden

Interface:

WakeWordDetector::new() - Load ONNX model
predict() - Check audio chunk for wake word
reset() - Clear buffer after detection

Implementation:

ONNX Runtime for inference
Resampling: 48kHz → 16kHz via rubato
Prediction buffer accumulation (not immediate results)

3. Speech-to-Text (`stt/`)

Purpose: Convert speech audio to text Pattern: Graybox - SttEngine interface, whisper.cpp hidden

Interface:

SttEngine::new() - Initialize with model
transcribe() - Audio → Text
supported_languages() - Query capabilities

Implementation:

whisper.cpp subprocess (local, no cloud)
WAV input file → text output
Language auto-detection

4. Language Model (`llm/`)

Purpose: Generate AI responses to user queries Pattern: Graybox - LlmClient interface, Ollama API hidden

Interface:

LlmClient::new() - Configure endpoint
chat() - Send message, get response
with_memory() - Enable conversation history
with_search() - Enable web search

Implementation:

HTTP client to local Ollama server
No API keys required (self-hosted)
Optional: conversation memory, web search

5. Text-to-Speech (`tts/`)

Purpose: Convert text responses to speech Pattern: Graybox - TtsEngine interface, Piper hidden

Interface:

TtsEngine::new() - Load voice model
speak() - Text → Audio (PCM samples)
is_speaking() / stop() - Control playback

Implementation:

Piper TTS (fast, local neural TTS)
WAV output converted to PCM
Voice model caching

6. Sound Effects (`sounds/`)

Purpose: Audio feedback for state transitions Pattern: Graybox - already clean interface

Interface:

SoundPlayer::new() - Configure directories
play_greeting() - Startup sound
play_ack() - Wake word detected
play_thinking() - Processing
play_error() - Something went wrong

Implementation:

Random selection from category directories
WAV files played via aplay
Can be disabled

7. GPIO Control (`gpio/`)

Purpose: Hardware feedback (LED, button) Pattern: Trait-based abstraction - Gpio trait

Interface:

Gpio::led_on() / led_off() - Visual feedback
Gpio::is_button_pressed() - Physical input
AiyHatMock - Test without hardware

Implementation:

Real: gpioset/gpioget via AIY Voice HAT
Mock: Console output only

8. State Machine (`state_machine.rs`)

Purpose: Orchestrate the conversation flow Pattern: Single file, clean state transitions

States:

Idle → Listening → Recording → Transcribing → Thinking → Speaking → Idle

Key Features:

Async/await throughout
Non-blocking I/O
Error recovery (transitions to Error state)
Activation counter (statistics)

Data Flow

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│   Audio In  │────▶│  Wake Word   │────▶│  Recording  │
│ (Microphone)│     │  Detection   │     │   (STT)     │
└─────────────┘     └──────────────┘     └──────┬──────┘
                                                 │
                                                 ▼
┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│  Audio Out  │◀────│     TTS      │◀────│    LLM      │
│  (Speaker)  │     │ (Response)   │     │ (Thinking)  │
└─────────────┘     └──────────────┘     └─────────────┘

Flow:

Continuous audio capture
Wake word detection (“Hey Jarvis”)
Recording user command
STT transcription
LLM generates response
TTS synthesizes speech
Audio playback

Configuration

File: config.toml (TOML format)

Sections:

[audio] - Sample rate, device, format
[wakeword] - Model path, threshold
[stt] - Whisper binary, model, language
[llm] - Ollama URL, model, system prompt
[tts] - Piper binary, voice model
[gpio] - Pin numbers, mock mode
[sounds] - Sound directories, enabled
[dev_mode] - Local testing flags

Environment-specific:

Pi/Batocera: Uses hardware pins, ALSA
Local dev: Mock GPIO, any audio device

Testing Strategy

Unit Tests

Each module: tests/<module>_tests.rs
Mock implementations for hardware
Behavior locked down for safe refactoring

Integration Tests

tests/integration_test.rs - Module interactions
tests/automated_integration_tests.rs - Full pipeline (synthetic audio)

Architecture Tests

tests/architecture_test.rs - Enforce conventions
Deep module validation (<10 public items)
Documentation requirements

Technology Stack

Component	Technology	Why
Language	Rust	Safety, performance, async
Async Runtime	Tokio	Non-blocking I/O
Audio	ALSA (arecord/aplay)	Pi compatibility
Wake Word	ONNX Runtime	Fast inference
STT	whisper.cpp	Local, accurate
LLM	Ollama	Self-hosted, no API keys
TTS	Piper	Fast, neural, local
GPIO	Linux sysfs	Hardware control

Design Principles

1. Deep Modules

Every module has simple interface hiding complex implementation

Example: WakeWordDetector (3 methods) vs 156 lines of ONNX/resampling code
Pattern: Public interface in mod.rs, implementation in imp/

2. Platform Abstraction

Works on Mac (dev) and Pi (prod) without changes

GPIO trait with real/mock implementations
Audio device configurable
Mock mode for all hardware

3. Fail Fast

Validation at startup, not runtime

Config validation on load
Hardware checks before main loop
Clear error messages

4. Observable

Structured logging at all transitions

tracing for structured logs
State machine transitions logged
Performance metrics

5. Privacy-First

No cloud dependencies for core functionality

All AI runs locally (Ollama, whisper.cpp, Piper)
No audio sent to external services
Optional: web search (user choice)

Module Dependencies

state_machine/
  ├── audio/
  ├── wakeword/
  ├── stt/
  ├── llm/
  ├── tts/
  ├── sounds/
  └── gpio/

Dependency Rules:

State machine coordinates all modules
Modules don’t depend on each other directly
All use config for shared settings
Clean separation allows mocking in tests

Production Deployment Architecture

For production deployment on Batocera/Raspberry Pi, Botface uses a sidecar pattern with openWakeWord running as an independent HTTP service.

What is the Sidecar Pattern?

The sidecar pattern is a architectural pattern where a secondary process (the “sidecar”) runs alongside a main application to provide supporting functionality. The sidecar shares the same lifecycle as the main application but operates in a separate process, communicating via lightweight protocols like HTTP or gRPC.

Formal Definition: Microsoft Azure Architecture - Sidecar Pattern

“Deploy components of an application into a separate process or container to provide isolation and encapsulation.”

Alternative References (non-vendor specific):

Martin Fowler - Sidecar Pattern - The original 2014 article that named the pattern, widely cited in software architecture literature
Kubernetes Documentation - Sidecar Containers - Cloud-native implementation using pod patterns
Cloud Native Computing Foundation (CNCF) - Sidecar Pattern - Cloud-native architectural pattern classification
IBM Cloud Architecture - Sidecar Pattern - Enterprise pattern catalog
IEEE Software Magazine - “Sidecars: A Pattern for Decoupling” - Academic treatment of the pattern

Key Characteristics:

Co-located: Sidecar runs on the same host as the main application
Separate Process: Isolated failure domain (if sidecar crashes, main app continues)
Shared Resources: Can access same filesystem, network, and devices
Language Agnostic: Main app and sidecar can use different languages/runtimes
Independent Lifecycle: Can be updated, restarted, or scaled independently

Why Sidecar for Botface?

We chose the sidecar pattern for wake word detection for three critical reasons:

1. Language Ecosystem Isolation

Wake word detection requires ONNX model inference and real-time audio processing. The Rust ecosystem for these tasks is limited compared to Python:

Capability	Python	Rust
ONNX Runtime	✅ Mature, optimized	⚠️ Basic bindings
openWakeWord	✅ Battle-tested	❌ Not available
Audio (sounddevice)	✅ Callback-based	⚠️ ALSA only
NumPy/SciPy signal processing	✅ Native	❌ Limited

Python’s mature ML/audio ecosystem provides better performance and reliability for wake word detection.

2. Audio Device Ownership

The sidecar owns all audio I/O (microphone access), providing:

Single point of control: One process manages the audio hardware
Buffer management: Python’s sounddevice library handles real-time audio callbacks efficiently
Isolation: Audio driver issues don’t crash the main Rust application
Device flexibility: Easy to swap audio backends (ALSA, PulseAudio, etc.)

3. Fault Isolation

If the wake word detector encounters issues (model loading, memory pressure, audio errors), the main Botface application continues running:

Graceful degradation: Botface falls back to button-based activation if sidecar unavailable
Independent restart: Can restart sidecar without stopping Botface
Simpler debugging: Separate logs for audio/wake-word vs. application logic

graph TB
    subgraph "Process Management"
        PM[botface-manager.sh<br/>or systemd]
    end

    subgraph "Wake Word Detection"
        WW[openWakeWord<br/>Python HTTP Service<br/>Port 8080]
        WW_API["/health - Health check"]
        WW_API2["/events - SSE stream"]
        WW_API3["/reset - Reset state"]
    end

    subgraph "Main Application"
        BF[Botface<br/>Rust Binary]
        SM[State Machine]
        STT[Speech-to-Text<br/>whisper.cpp]
        LLM[LLM Client<br/>Ollama]
        TTS[Text-to-Speech<br/>Piper]
    end

    subgraph "Shared Resources"
        LOGS[(Log Files<br/>/userdata/voice-assistant/logs/)]
        MODELS[(Models<br/>ONNX/GGML)]
    end

    PM -->|Manages| WW
    PM -->|Manages| BF

    WW -->|SSE Events| BF
    BF -->|HTTP POST| WW

    BF --> SM
    SM --> STT
    SM --> LLM
    SM --> TTS

    WW -.->|Logs| LOGS
    BF -.->|Logs| LOGS
    WW -.->|Loads| MODELS
    BF -.->|Uses| MODELS

Deployment Flow

Process Manager (botface-manager.sh or systemd) starts both services
openWakeWord starts first and exposes HTTP API on port 8080
Botface connects to openWakeWord via HTTP/SSE
Wake word events stream from Python to Rust via Server-Sent Events
Both services write logs to shared log directory

Service Management

# Start both services
/userdata/voice-assistant/botface-manager.sh start

# Check status
/userdata/voice-assistant/botface-manager.sh status

# View logs
/userdata/voice-assistant/botface-manager.sh logs

# Stop
/userdata/voice-assistant/botface-manager.sh stop

Why Sidecar Pattern?

Language isolation - Python crashes don’t bring down Rust app
Independent updates - Update wake word model without touching main app
Health monitoring - Each service can be monitored independently
Resource management - Separate resource limits for each component

Future Enhancements

Near-term

Streaming STT (process audio while user still speaking)
Multi-turn conversations (context memory)
Voice activity detection (VAD)
Better error recovery

Long-term

Multiple wake words
Speaker recognition (who is speaking)
Custom voice models
Tool calling (control smart home, etc.)

Graybox Pattern Application

All modules follow Matt Pocock’s deep module pattern:

wakeword/
├── mod.rs          # Public: 3 methods
└── imp/
    └── mod.rs      # Private: 156 lines implementation

Benefits:

AI navigates codebase in seconds
Tests lock behavior (safe refactoring)
Clear entry points
Progressive disclosure

Module: vision

Location: src/vision/

Description: [Auto-detected module - please add description]

Public Interface: [Please document public API]

Dependencies: [Please list dependencies]

AI Context:

[Add guidance for AI working with this module]
[Document common modification tasks]
[Note testing requirements]

Botface Voice Assistant Documentation

Botface Architecture

System Overview

Core Components

1. Audio Subsystem (`audio/`)

2. Wake Word Detection (`wakeword/`)

3. Speech-to-Text (`stt/`)

4. Language Model (`llm/`)

5. Text-to-Speech (`tts/`)

6. Sound Effects (`sounds/`)

7. GPIO Control (`gpio/`)

8. State Machine (`state_machine.rs`)

Data Flow

Configuration

Testing Strategy

Unit Tests

Integration Tests

Architecture Tests

Technology Stack

Design Principles

1. Deep Modules

2. Platform Abstraction

3. Fail Fast

4. Observable

5. Privacy-First

Module Dependencies

Production Deployment Architecture

What is the Sidecar Pattern?

Why Sidecar for Botface?

Deployment Flow

Service Management

Why Sidecar Pattern?

Future Enhancements

Near-term

Long-term

Graybox Pattern Application

See Also

Module: vision

Keyboard shortcuts

Botface Voice Assistant Documentation