Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Botface - Voice Assistant for Raspberry Pi 5 + AIY Voice HAT

Offline voice-controlled AI assistant written in Rust for Raspberry Pi 5 + Google AIY Voice HAT v1 + Batocera.

Status: Core architecture complete, integrations in progress - Wake word detection, LED control, transcription, LLM responses, and audio playback functional on AIY Voice HAT.

Architecture

Botface uses a sidecar pattern for audio I/O:

  • Botface (Rust): Main state machine, LLM integration (Ollama), TTS (Piper), orchestration
  • Sidecar (Python): HTTP service handling wake word detection (openWakeWord) and audio recording
  • Communication: HTTP + SSE (Server-Sent Events) between Botface and sidecar

This architecture provides:

  • Language isolation (Python crashes don’t affect Rust)
  • Independent audio lifecycle
  • Better monitoring and health checks
User Speech → Sidecar (Python) → SSE Events → Botface (Rust)
                                              ↓
                         TTS Audio ← Piper ← LLM Response ← Ollama
                                              ↓
                           LED + AIY Voice HAT Speaker

Quick Start (Local Development on macOS)

One-Time Setup

# Download required models and binaries
./scripts/setup.sh --dev

# This downloads:
# - Wake word model (hey_jarvis.onnx)
# - Whisper binary (speech-to-text)
# - Whisper model (ggml-base.en.bin)
# - Creates default config.toml

Running the Assistant

cd Botface

# Run in development mode (mock GPIO, local audio)
cargo run

# Or with explicit flags
cargo run -- --mock-gpio --local-audio --verbose

# Check CLI help
cargo run -- --help

What happens in local mode:

  • Uses your Mac’s microphone via cpal
  • GPIO operations print to console instead of controlling hardware
  • Validates Ollama connection
  • Skips Pi-specific binary checks
  • Note: Sidecar not used in local dev mode (native wake word detection)

Production Build for Pi

Build and Deploy Workflow

CRITICAL: Build on macOS, deploy to Pi. Never build on the Pi.

# 1. Build for Raspberry Pi 5 (ARM64) on macOS
cross build --release --target aarch64-unknown-linux-gnu

# Binary location: target/aarch64-unknown-linux-gnu/release/botface

# 2. Deploy to Pi
scp target/aarch64-unknown-linux-gnu/release/botface \
   root@<pi-ip>:/userdata/voice-assistant/

# 3. Start services on Pi
ssh root@<pi-ip> "cd /userdata/voice-assistant && \
   python3 wakeword_sidecar.py --model models/hey_jarvis.onnx --threshold 0.5 --port 8080 & \
   ./botface"

See docs/INTEGRATION_ROADMAP.md for detailed deployment instructions.

Project Structure

├── Cargo.toml              # Dependencies & features
├── Cargo.lock              # Dependency lock file
├── Cross.toml              # Cross-compilation configuration
├── README.md               # This file
├── docs/                   # Additional documentation
│   ├── INTEGRATION_ROADMAP.md  # Deployment guide
│   ├── dev-log/            # Development session logs
│   └── ARCHITECTURE.md     # System design
├── assets/                 # Static assets
│   ├── sounds/             # WAV sound effects
│   └── models/             # ONNX models (not in git)
├── src/
│   ├── main.rs             # Entry point with CLI args
│   ├── lib.rs              # Library exports
│   ├── config.rs           # Configuration (local vs Pi)
│   ├── state_machine.rs    # Core state management
│   ├── sidecar/            # HTTP client for sidecar
│   ├── audio/              # Audio playback (TTS output)
│   ├── wakeword/           # Wake word (native, sidecar preferred)
│   ├── stt/                # Speech-to-text (whisper.cpp)
│   ├── llm/                # Language model (Ollama)
│   ├── tts/                # Text-to-speech (Piper)
│   ├── gpio/               # Hardware control (real + mock)
│   └── sounds/             # Sound effects
├── scripts/
│   ├── wakeword_sidecar.py # Python HTTP sidecar
│   ├── build.sh            # Cross-compile for Pi 5
│   └── deploy.sh           # Deploy to Pi via rsync
└── config.toml             # Configuration file

Verified Working Features

All components tested and verified on Raspberry Pi 5 + AIY Voice HAT v1:

  • Wake Word Detection: “Hey Jarvis” detected via sidecar (scores 0.85-0.99)
  • LED Control: Physical LED on AIY HAT (ON during recording, OFF when idle)
  • Audio Recording: 5-second clips captured via sidecar
  • Speech-to-Text: whisper.cpp transcribes with high accuracy
  • LLM Integration: Ollama generates contextual responses
  • Text-to-Speech: Piper synthesizes natural speech
  • Audio Playback: Verified working through AIY Voice HAT speaker (using aplay -D plughw:0,0)
  • State Machine: Full pipeline Idle → Wake → Record → Transcribe → Think → Speak → Idle

Development vs Production Modes

Local Development (macOS/Linux Desktop)

Features:

  • Audio Input: Uses cpal to capture from your Mac’s microphone
  • Audio Output: System default audio device
  • GPIO: Mock implementation (prints to console)
  • Wake Word: Native Rust (optional, sidecar not used)
  • Validation: Checks for Ollama, skips Pi-specific binaries

Useful for:

  • Testing state machine logic
  • Debugging LLM integration
  • Rapid iteration without deploying

Production (Raspberry Pi 5)

Features:

  • Audio Input: Sidecar (Python) with sounddevice + openWakeWord
  • Audio Output: aplay -D plughw:0,0 (direct to AIY Voice HAT)
  • GPIO: Real hardware control via gpioset/gpioget
  • Validation: Checks all binaries (whisper, piper, ollama, sidecar)

Deployed via:

  • Cross-compiled ARM64 binary on macOS
  • SCP to /userdata/voice-assistant/
  • Manual start of sidecar + botface

Configuration

The assistant automatically detects your platform and adjusts:

macOS (Local Dev):

[dev_mode]
enabled = true
mock_gpio = true
local_audio = true
skip_binary_checks = true

[audio]
device = "default"

[gpio]
mock_enabled = true

Raspberry Pi (Production):

[dev_mode]
enabled = false

[wakeword]
model_path = "/userdata/voice-assistant/models/hey_jarvis.onnx"
threshold = 0.5

[stt]
whisper_binary = "/userdata/voice-assistant/whisper-cli"
whisper_model = "/userdata/voice-assistant/models/ggml-base.en.bin"

[tts]
piper_binary = "/userdata/voice-assistant/piper/piper"
voice_model = "/userdata/voice-assistant/models/en_US-amy-medium.onnx"

[gpio]
mock_enabled = false
led_pin = 25

Create config.toml in project root for local testing, or in /userdata/voice-assistant/ on Pi.

Usage Examples

Local Development Mode

# Basic local run (auto-detects macOS)
cargo run

# With verbose logging
cargo run -- --verbose

# Skip dependency checks (faster startup)
cargo run -- --skip-checks

# Custom config
cargo run -- --config ./my-config.toml

Production Mode on Pi

# Set your Pi's IP address
PI_IP="192.168.X.X"

# On macOS - Build release binary for Pi
cross build --release --target aarch64-unknown-linux-gnu

# Deploy
scp target/aarch64-unknown-linux-gnu/release/botface \
   root@$PI_IP:/userdata/voice-assistant/

# On Pi - Start sidecar first, then botface
ssh root@$PI_IP "cd /userdata/voice-assistant && \
   python3 wakeword_sidecar.py --model models/hey_jarvis.onnx --threshold 0.5 --port 8080 > /tmp/sidecar.log 2>&1 & \
   export LD_LIBRARY_PATH=/userdata/voice-assistant:\$LD_LIBRARY_PATH && \
   ./botface > /tmp/botface.log 2>&1 &"

# View logs
ssh root@$PI_IP "tail -f /tmp/botface.log /tmp/sidecar.log"

Testing Without Hardware

You can test most functionality on your Mac:

  1. Install Ollama locally:

    brew install ollama
    ollama pull llama3.2
    
  2. Run with mocks:

    cargo run -- --mock-gpio --skip-checks
    

Limitations of local testing:

  • Can’t test actual LED/button
  • Audio quality depends on your Mac’s mic
  • No whisper.cpp or piper (unless you install them)
  • But wake word detection and state machine work!

Architecture Highlights

Sidecar Pattern

The sidecar handles audio I/O separately from the main Rust application:

  • Sidecar HTTP API:

    • GET /health - Health check
    • GET /events - SSE stream for wake word events
    • POST /record - Record audio for specified duration
    • POST /reset - Reset detection state
  • Benefits:

    • Python handles audio streaming (sounddevice)
    • Rust handles orchestration and LLM logic
    • Independent restart/crash recovery

Async State Machine

#![allow(unused)]
fn main() {
Idle → Listening → Recording → Transcribing →
Thinking → Speaking → Idle
}

Each state has:

  • Entry actions (LED, sounds)
  • Async operations (non-blocking)
  • Exit cleanup

Trait-Based GPIO

#![allow(unused)]
fn main() {
#[async_trait]
trait Gpio {
    async fn led_on(&mut self) -> Result<()>;
    async fn led_off(&mut self) -> Result<()>;
    async fn is_button_pressed(&self) -> Result<bool>;
}

// Two implementations:
// - AiyHatReal: System commands on Pi (gpioset/gpioget)
// - AiyHatMock: Console output on Mac
}

Feature Flags

  • sidecar (default): Use Python HTTP sidecar for wake word
  • native-wakeword: Native ONNX wake word (conditionally compiled)
  • local-dev: Local development settings (macOS)
  • pi-deploy: Production deployment settings

Development Workflow

1. Edit Code Locally

cd botface
# Edit src/*.rs files

2. Test on Mac

# Quick iteration
cargo run -- --mock-gpio

# With all logging
cargo run -- --verbose 2>&1 | grep -E "(DEBUG|INFO|WARN)"

3. Build for Pi

just build-pi
# or
cross build --release --target aarch64-unknown-linux-gnu

4. Deploy to Pi

# See AGENTS.md for detailed deploy commands
scp target/aarch64-unknown-linux-gnu/release/botface \
   root@<pi-ip>:/userdata/voice-assistant/

5. Monitor

ssh root@<pi-ip> "tail -f /tmp/botface.log /tmp/sidecar.log"

Learning Rust with This Project

This codebase demonstrates:

  • Async/await with tokio
  • Traits and generics for GPIO abstraction
  • Error handling with anyhow/thiserror
  • Cross-compilation for embedded targets
  • HTTP client/server with reqwest and SSE
  • Subprocess management for external binaries
  • Configuration management with serde
  • Feature flags for conditional compilation

Documentation

  • docs/INTEGRATION_ROADMAP.md - Complete deployment guide
  • docs/dev-log/ - Development session logs
  • AGENTS.md - Coding guidelines for AI assistants
  • .opencode/ci-knowledge.md - CI/CD knowledge

License

MIT License - See LICENSE file for details