Botface Architecture
Project: Botface - Rust Voice Assistant for Batocera/Raspberry Pi Status: Active Development Last Updated: March 2026
System Overview
Botface is a voice-controlled AI assistant that runs on Raspberry Pi with Batocera Linux. It provides hands-free interaction through wake word detection, speech recognition, AI language model integration, and text-to-speech responses.
Core Components
1. Audio Subsystem (audio/)
Purpose: Capture microphone input and playback responses
Pattern: Graybox - simple AudioCapture interface, complex ALSA implementation hidden
Interface:
AudioCapture::new()- Configure capturestart_continuous()- Stream audio chunksContinuousHandle- Stop recording
Hardware:
- Raspberry Pi: ALSA via
arecord/aplaysubprocesses - Local dev: Any audio device (macOS compatible)
2. Wake Word Detection (wakeword/)
Purpose: Detect “Hey Jarvis” wake phrase
Pattern: Graybox - WakeWordDetector struct, ONNX inference hidden
Interface:
WakeWordDetector::new()- Load ONNX modelpredict()- Check audio chunk for wake wordreset()- Clear buffer after detection
Implementation:
- ONNX Runtime for inference
- Resampling: 48kHz → 16kHz via rubato
- Prediction buffer accumulation (not immediate results)
3. Speech-to-Text (stt/)
Purpose: Convert speech audio to text
Pattern: Graybox - SttEngine interface, whisper.cpp hidden
Interface:
SttEngine::new()- Initialize with modeltranscribe()- Audio → Textsupported_languages()- Query capabilities
Implementation:
- whisper.cpp subprocess (local, no cloud)
- WAV input file → text output
- Language auto-detection
4. Language Model (llm/)
Purpose: Generate AI responses to user queries
Pattern: Graybox - LlmClient interface, Ollama API hidden
Interface:
LlmClient::new()- Configure endpointchat()- Send message, get responsewith_memory()- Enable conversation historywith_search()- Enable web search
Implementation:
- HTTP client to local Ollama server
- No API keys required (self-hosted)
- Optional: conversation memory, web search
5. Text-to-Speech (tts/)
Purpose: Convert text responses to speech
Pattern: Graybox - TtsEngine interface, Piper hidden
Interface:
TtsEngine::new()- Load voice modelspeak()- Text → Audio (PCM samples)is_speaking()/stop()- Control playback
Implementation:
- Piper TTS (fast, local neural TTS)
- WAV output converted to PCM
- Voice model caching
6. Sound Effects (sounds/)
Purpose: Audio feedback for state transitions Pattern: Graybox - already clean interface
Interface:
SoundPlayer::new()- Configure directoriesplay_greeting()- Startup soundplay_ack()- Wake word detectedplay_thinking()- Processingplay_error()- Something went wrong
Implementation:
- Random selection from category directories
- WAV files played via
aplay - Can be disabled
7. GPIO Control (gpio/)
Purpose: Hardware feedback (LED, button)
Pattern: Trait-based abstraction - Gpio trait
Interface:
Gpio::led_on()/led_off()- Visual feedbackGpio::is_button_pressed()- Physical inputAiyHatMock- Test without hardware
Implementation:
- Real:
gpioset/gpiogetvia AIY Voice HAT - Mock: Console output only
8. State Machine (state_machine.rs)
Purpose: Orchestrate the conversation flow Pattern: Single file, clean state transitions
States:
Idle → Listening → Recording → Transcribing → Thinking → Speaking → Idle
Key Features:
- Async/await throughout
- Non-blocking I/O
- Error recovery (transitions to Error state)
- Activation counter (statistics)
Data Flow
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ Audio In │────▶│ Wake Word │────▶│ Recording │
│ (Microphone)│ │ Detection │ │ (STT) │
└─────────────┘ └──────────────┘ └──────┬──────┘
│
▼
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ Audio Out │◀────│ TTS │◀────│ LLM │
│ (Speaker) │ │ (Response) │ │ (Thinking) │
└─────────────┘ └──────────────┘ └─────────────┘
Flow:
- Continuous audio capture
- Wake word detection (“Hey Jarvis”)
- Recording user command
- STT transcription
- LLM generates response
- TTS synthesizes speech
- Audio playback
Configuration
File: config.toml (TOML format)
Sections:
[audio]- Sample rate, device, format[wakeword]- Model path, threshold[stt]- Whisper binary, model, language[llm]- Ollama URL, model, system prompt[tts]- Piper binary, voice model[gpio]- Pin numbers, mock mode[sounds]- Sound directories, enabled[dev_mode]- Local testing flags
Environment-specific:
- Pi/Batocera: Uses hardware pins, ALSA
- Local dev: Mock GPIO, any audio device
Testing Strategy
Unit Tests
- Each module:
tests/<module>_tests.rs - Mock implementations for hardware
- Behavior locked down for safe refactoring
Integration Tests
tests/integration_test.rs- Module interactionstests/automated_integration_tests.rs- Full pipeline (synthetic audio)
Architecture Tests
tests/architecture_test.rs- Enforce conventions- Deep module validation (<10 public items)
- Documentation requirements
Technology Stack
| Component | Technology | Why |
|---|---|---|
| Language | Rust | Safety, performance, async |
| Async Runtime | Tokio | Non-blocking I/O |
| Audio | ALSA (arecord/aplay) | Pi compatibility |
| Wake Word | ONNX Runtime | Fast inference |
| STT | whisper.cpp | Local, accurate |
| LLM | Ollama | Self-hosted, no API keys |
| TTS | Piper | Fast, neural, local |
| GPIO | Linux sysfs | Hardware control |
Design Principles
1. Deep Modules
Every module has simple interface hiding complex implementation
- Example:
WakeWordDetector(3 methods) vs 156 lines of ONNX/resampling code - Pattern: Public interface in
mod.rs, implementation inimp/
2. Platform Abstraction
Works on Mac (dev) and Pi (prod) without changes
- GPIO trait with real/mock implementations
- Audio device configurable
- Mock mode for all hardware
3. Fail Fast
Validation at startup, not runtime
- Config validation on load
- Hardware checks before main loop
- Clear error messages
4. Observable
Structured logging at all transitions
tracingfor structured logs- State machine transitions logged
- Performance metrics
5. Privacy-First
No cloud dependencies for core functionality
- All AI runs locally (Ollama, whisper.cpp, Piper)
- No audio sent to external services
- Optional: web search (user choice)
Module Dependencies
state_machine/
├── audio/
├── wakeword/
├── stt/
├── llm/
├── tts/
├── sounds/
└── gpio/
Dependency Rules:
- State machine coordinates all modules
- Modules don’t depend on each other directly
- All use
configfor shared settings - Clean separation allows mocking in tests
Production Deployment Architecture
For production deployment on Batocera/Raspberry Pi, Botface uses a sidecar pattern with openWakeWord running as an independent HTTP service.
What is the Sidecar Pattern?
The sidecar pattern is a architectural pattern where a secondary process (the “sidecar”) runs alongside a main application to provide supporting functionality. The sidecar shares the same lifecycle as the main application but operates in a separate process, communicating via lightweight protocols like HTTP or gRPC.
Formal Definition: Microsoft Azure Architecture - Sidecar Pattern
“Deploy components of an application into a separate process or container to provide isolation and encapsulation.”
Alternative References (non-vendor specific):
- Martin Fowler - Sidecar Pattern - The original 2014 article that named the pattern, widely cited in software architecture literature
- Kubernetes Documentation - Sidecar Containers - Cloud-native implementation using pod patterns
- Cloud Native Computing Foundation (CNCF) - Sidecar Pattern - Cloud-native architectural pattern classification
- IBM Cloud Architecture - Sidecar Pattern - Enterprise pattern catalog
- IEEE Software Magazine - “Sidecars: A Pattern for Decoupling” - Academic treatment of the pattern
Key Characteristics:
- Co-located: Sidecar runs on the same host as the main application
- Separate Process: Isolated failure domain (if sidecar crashes, main app continues)
- Shared Resources: Can access same filesystem, network, and devices
- Language Agnostic: Main app and sidecar can use different languages/runtimes
- Independent Lifecycle: Can be updated, restarted, or scaled independently
Why Sidecar for Botface?
We chose the sidecar pattern for wake word detection for three critical reasons:
1. Language Ecosystem Isolation
Wake word detection requires ONNX model inference and real-time audio processing. The Rust ecosystem for these tasks is limited compared to Python:
| Capability | Python | Rust |
|---|---|---|
| ONNX Runtime | ✅ Mature, optimized | ⚠️ Basic bindings |
| openWakeWord | ✅ Battle-tested | ❌ Not available |
| Audio (sounddevice) | ✅ Callback-based | ⚠️ ALSA only |
| NumPy/SciPy signal processing | ✅ Native | ❌ Limited |
Python’s mature ML/audio ecosystem provides better performance and reliability for wake word detection.
2. Audio Device Ownership
The sidecar owns all audio I/O (microphone access), providing:
- Single point of control: One process manages the audio hardware
- Buffer management: Python’s sounddevice library handles real-time audio callbacks efficiently
- Isolation: Audio driver issues don’t crash the main Rust application
- Device flexibility: Easy to swap audio backends (ALSA, PulseAudio, etc.)
3. Fault Isolation
If the wake word detector encounters issues (model loading, memory pressure, audio errors), the main Botface application continues running:
- Graceful degradation: Botface falls back to button-based activation if sidecar unavailable
- Independent restart: Can restart sidecar without stopping Botface
- Simpler debugging: Separate logs for audio/wake-word vs. application logic
graph TB
subgraph "Process Management"
PM[botface-manager.sh<br/>or systemd]
end
subgraph "Wake Word Detection"
WW[openWakeWord<br/>Python HTTP Service<br/>Port 8080]
WW_API["/health - Health check"]
WW_API2["/events - SSE stream"]
WW_API3["/reset - Reset state"]
end
subgraph "Main Application"
BF[Botface<br/>Rust Binary]
SM[State Machine]
STT[Speech-to-Text<br/>whisper.cpp]
LLM[LLM Client<br/>Ollama]
TTS[Text-to-Speech<br/>Piper]
end
subgraph "Shared Resources"
LOGS[(Log Files<br/>/userdata/voice-assistant/logs/)]
MODELS[(Models<br/>ONNX/GGML)]
end
PM -->|Manages| WW
PM -->|Manages| BF
WW -->|SSE Events| BF
BF -->|HTTP POST| WW
BF --> SM
SM --> STT
SM --> LLM
SM --> TTS
WW -.->|Logs| LOGS
BF -.->|Logs| LOGS
WW -.->|Loads| MODELS
BF -.->|Uses| MODELS
Deployment Flow
- Process Manager (
botface-manager.shor systemd) starts both services - openWakeWord starts first and exposes HTTP API on port 8080
- Botface connects to openWakeWord via HTTP/SSE
- Wake word events stream from Python to Rust via Server-Sent Events
- Both services write logs to shared log directory
Service Management
# Start both services
/userdata/voice-assistant/botface-manager.sh start
# Check status
/userdata/voice-assistant/botface-manager.sh status
# View logs
/userdata/voice-assistant/botface-manager.sh logs
# Stop
/userdata/voice-assistant/botface-manager.sh stop
Why Sidecar Pattern?
- Language isolation - Python crashes don’t bring down Rust app
- Independent updates - Update wake word model without touching main app
- Health monitoring - Each service can be monitored independently
- Resource management - Separate resource limits for each component
Future Enhancements
Near-term
- Streaming STT (process audio while user still speaking)
- Multi-turn conversations (context memory)
- Voice activity detection (VAD)
- Better error recovery
Long-term
- Multiple wake words
- Speaker recognition (who is speaking)
- Custom voice models
- Tool calling (control smart home, etc.)
Graybox Pattern Application
All modules follow Matt Pocock’s deep module pattern:
wakeword/
├── mod.rs # Public: 3 methods
└── imp/
└── mod.rs # Private: 156 lines implementation
Benefits:
- AI navigates codebase in seconds
- Tests lock behavior (safe refactoring)
- Clear entry points
- Progressive disclosure
See Also
- AGENTS.md - Coding guidelines for AI assistants
- context/v1.0/PATTERNS.md - Agentic workflow patterns
- docs/ai-readiness.md - Architecture improvements
- docs/codebase-audit.md - Comparison to best practices
Architecture version: 1.0 Pocock Score: 10/10 (deep modules throughout) Tests: 86 passing (unit + integration + architecture)
Module: vision
Location: src/vision/
Description: [Auto-detected module - please add description]
Public Interface: [Please document public API]
Dependencies: [Please list dependencies]
AI Context:
- [Add guidance for AI working with this module]
- [Document common modification tasks]
- [Note testing requirements]