Botface - Voice Assistant for Raspberry Pi 5 + AIY Voice HAT

Offline voice-controlled AI assistant written in Rust for Raspberry Pi 5 + Google AIY Voice HAT v1 + Batocera.

Status: Core architecture complete, integrations in progress - Wake word detection, LED control, transcription, LLM responses, and audio playback functional on AIY Voice HAT.

Architecture

Botface uses a sidecar pattern for audio I/O:

Botface (Rust): Main state machine, LLM integration (Ollama), TTS (Piper), orchestration
Sidecar (Python): HTTP service handling wake word detection (openWakeWord) and audio recording
Communication: HTTP + SSE (Server-Sent Events) between Botface and sidecar

This architecture provides:

Language isolation (Python crashes don’t affect Rust)
Independent audio lifecycle
Better monitoring and health checks

User Speech → Sidecar (Python) → SSE Events → Botface (Rust)
                                              ↓
                         TTS Audio ← Piper ← LLM Response ← Ollama
                                              ↓
                           LED + AIY Voice HAT Speaker

Quick Start (Local Development on macOS)

One-Time Setup

# Download required models and binaries
./scripts/setup.sh --dev

# This downloads:
# - Wake word model (hey_jarvis.onnx)
# - Whisper binary (speech-to-text)
# - Whisper model (ggml-base.en.bin)
# - Creates default config.toml

Running the Assistant

cd Botface

# Run in development mode (mock GPIO, local audio)
cargo run

# Or with explicit flags
cargo run -- --mock-gpio --local-audio --verbose

# Check CLI help
cargo run -- --help

What happens in local mode:

Uses your Mac’s microphone via cpal
GPIO operations print to console instead of controlling hardware
Validates Ollama connection
Skips Pi-specific binary checks
Note: Sidecar not used in local dev mode (native wake word detection)

Production Build for Pi

Build and Deploy Workflow

CRITICAL: Build on macOS, deploy to Pi. Never build on the Pi.

# 1. Build for Raspberry Pi 5 (ARM64) on macOS
cross build --release --target aarch64-unknown-linux-gnu

# Binary location: target/aarch64-unknown-linux-gnu/release/botface

# 2. Deploy to Pi
scp target/aarch64-unknown-linux-gnu/release/botface \
   root@<pi-ip>:/userdata/voice-assistant/

# 3. Start services on Pi
ssh root@<pi-ip> "cd /userdata/voice-assistant && \
   python3 wakeword_sidecar.py --model models/hey_jarvis.onnx --threshold 0.5 --port 8080 & \
   ./botface"

See docs/INTEGRATION_ROADMAP.md for detailed deployment instructions.

Project Structure

├── Cargo.toml              # Dependencies & features
├── Cargo.lock              # Dependency lock file
├── Cross.toml              # Cross-compilation configuration
├── README.md               # This file
├── docs/                   # Additional documentation
│   ├── INTEGRATION_ROADMAP.md  # Deployment guide
│   ├── dev-log/            # Development session logs
│   └── ARCHITECTURE.md     # System design
├── assets/                 # Static assets
│   ├── sounds/             # WAV sound effects
│   └── models/             # ONNX models (not in git)
├── src/
│   ├── main.rs             # Entry point with CLI args
│   ├── lib.rs              # Library exports
│   ├── config.rs           # Configuration (local vs Pi)
│   ├── state_machine.rs    # Core state management
│   ├── sidecar/            # HTTP client for sidecar
│   ├── audio/              # Audio playback (TTS output)
│   ├── wakeword/           # Wake word (native, sidecar preferred)
│   ├── stt/                # Speech-to-text (whisper.cpp)
│   ├── llm/                # Language model (Ollama)
│   ├── tts/                # Text-to-speech (Piper)
│   ├── gpio/               # Hardware control (real + mock)
│   └── sounds/             # Sound effects
├── scripts/
│   ├── wakeword_sidecar.py # Python HTTP sidecar
│   ├── build.sh            # Cross-compile for Pi 5
│   └── deploy.sh           # Deploy to Pi via rsync
└── config.toml             # Configuration file

Verified Working Features

All components tested and verified on Raspberry Pi 5 + AIY Voice HAT v1:

✅ Wake Word Detection: “Hey Jarvis” detected via sidecar (scores 0.85-0.99)
✅ LED Control: Physical LED on AIY HAT (ON during recording, OFF when idle)
✅ Audio Recording: 5-second clips captured via sidecar
✅ Speech-to-Text: whisper.cpp transcribes with high accuracy
✅ LLM Integration: Ollama generates contextual responses
✅ Text-to-Speech: Piper synthesizes natural speech
✅ Audio Playback: Verified working through AIY Voice HAT speaker (using aplay -D plughw:0,0)
✅ State Machine: Full pipeline Idle → Wake → Record → Transcribe → Think → Speak → Idle

Development vs Production Modes

Local Development (macOS/Linux Desktop)

Features:

Audio Input: Uses cpal to capture from your Mac’s microphone
Audio Output: System default audio device
GPIO: Mock implementation (prints to console)
Wake Word: Native Rust (optional, sidecar not used)
Validation: Checks for Ollama, skips Pi-specific binaries

Useful for:

Testing state machine logic
Debugging LLM integration
Rapid iteration without deploying

Production (Raspberry Pi 5)

Features:

Audio Input: Sidecar (Python) with sounddevice + openWakeWord
Audio Output: aplay -D plughw:0,0 (direct to AIY Voice HAT)
GPIO: Real hardware control via gpioset/gpioget
Validation: Checks all binaries (whisper, piper, ollama, sidecar)

Deployed via:

Cross-compiled ARM64 binary on macOS
SCP to /userdata/voice-assistant/
Manual start of sidecar + botface

Configuration

The assistant automatically detects your platform and adjusts:

macOS (Local Dev):

[dev_mode]
enabled = true
mock_gpio = true
local_audio = true
skip_binary_checks = true

[audio]
device = "default"

[gpio]
mock_enabled = true

Raspberry Pi (Production):

[dev_mode]
enabled = false

[wakeword]
model_path = "/userdata/voice-assistant/models/hey_jarvis.onnx"
threshold = 0.5

[stt]
whisper_binary = "/userdata/voice-assistant/whisper-cli"
whisper_model = "/userdata/voice-assistant/models/ggml-base.en.bin"

[tts]
piper_binary = "/userdata/voice-assistant/piper/piper"
voice_model = "/userdata/voice-assistant/models/en_US-amy-medium.onnx"

[gpio]
mock_enabled = false
led_pin = 25

Create config.toml in project root for local testing, or in /userdata/voice-assistant/ on Pi.

Usage Examples

Local Development Mode

# Basic local run (auto-detects macOS)
cargo run

# With verbose logging
cargo run -- --verbose

# Skip dependency checks (faster startup)
cargo run -- --skip-checks

# Custom config
cargo run -- --config ./my-config.toml

Production Mode on Pi

# Set your Pi's IP address
PI_IP="192.168.X.X"

# On macOS - Build release binary for Pi
cross build --release --target aarch64-unknown-linux-gnu

# Deploy
scp target/aarch64-unknown-linux-gnu/release/botface \
   root@$PI_IP:/userdata/voice-assistant/

# On Pi - Start sidecar first, then botface
ssh root@$PI_IP "cd /userdata/voice-assistant && \
   python3 wakeword_sidecar.py --model models/hey_jarvis.onnx --threshold 0.5 --port 8080 > /tmp/sidecar.log 2>&1 & \
   export LD_LIBRARY_PATH=/userdata/voice-assistant:\$LD_LIBRARY_PATH && \
   ./botface > /tmp/botface.log 2>&1 &"

# View logs
ssh root@$PI_IP "tail -f /tmp/botface.log /tmp/sidecar.log"

Testing Without Hardware

You can test most functionality on your Mac:

Install Ollama locally:

brew install ollama
ollama pull llama3.2

Run with mocks:
```
cargo run -- --mock-gpio --skip-checks
```

Limitations of local testing:

Can’t test actual LED/button
Audio quality depends on your Mac’s mic
No whisper.cpp or piper (unless you install them)
But wake word detection and state machine work!

Architecture Highlights

Sidecar Pattern

The sidecar handles audio I/O separately from the main Rust application:

Sidecar HTTP API:
- GET /health - Health check
- GET /events - SSE stream for wake word events
- POST /record - Record audio for specified duration
- POST /reset - Reset detection state
Benefits:
- Python handles audio streaming (sounddevice)
- Rust handles orchestration and LLM logic
- Independent restart/crash recovery

Async State Machine

#![allow(unused)]
fn main() {
Idle → Listening → Recording → Transcribing →
Thinking → Speaking → Idle
}

Each state has:

Entry actions (LED, sounds)
Async operations (non-blocking)
Exit cleanup

Trait-Based GPIO

#![allow(unused)]
fn main() {
#[async_trait]
trait Gpio {
    async fn led_on(&mut self) -> Result<()>;
    async fn led_off(&mut self) -> Result<()>;
    async fn is_button_pressed(&self) -> Result<bool>;
}

// Two implementations:
// - AiyHatReal: System commands on Pi (gpioset/gpioget)
// - AiyHatMock: Console output on Mac
}

Feature Flags

sidecar (default): Use Python HTTP sidecar for wake word
native-wakeword: Native ONNX wake word (conditionally compiled)
local-dev: Local development settings (macOS)
pi-deploy: Production deployment settings

Development Workflow

1. Edit Code Locally

cd botface
# Edit src/*.rs files

2. Test on Mac

# Quick iteration
cargo run -- --mock-gpio

# With all logging
cargo run -- --verbose 2>&1 | grep -E "(DEBUG|INFO|WARN)"

3. Build for Pi

just build-pi
# or
cross build --release --target aarch64-unknown-linux-gnu

4. Deploy to Pi

# See AGENTS.md for detailed deploy commands
scp target/aarch64-unknown-linux-gnu/release/botface \
   root@<pi-ip>:/userdata/voice-assistant/

5. Monitor

ssh root@<pi-ip> "tail -f /tmp/botface.log /tmp/sidecar.log"

Learning Rust with This Project

This codebase demonstrates:

Async/await with tokio
Traits and generics for GPIO abstraction
Error handling with anyhow/thiserror
Cross-compilation for embedded targets
HTTP client/server with reqwest and SSE
Subprocess management for external binaries
Configuration management with serde
Feature flags for conditional compilation

Documentation

docs/INTEGRATION_ROADMAP.md - Complete deployment guide
docs/dev-log/ - Development session logs
AGENTS.md - Coding guidelines for AI assistants
.opencode/ci-knowledge.md - CI/CD knowledge

License

MIT License - See LICENSE file for details

Keyboard shortcuts

Botface Voice Assistant Documentation