AIY Voice HAT on Batocera - Voice Assistant Setup

Overview

Complete working voice assistant for Raspberry Pi 5 + Google AIY Voice HAT v1 + Batocera.

Trigger Methods:

Wake Word - Say “Hey Jarvis” (now working!)
Physical Button - Press button on GPIO 23 (alternative method)

Why Two Methods: Wake word is now fully functional, but button remains as a reliable alternative in noisy environments.

What Actually Works ✅

Wake Word OR Button → Record → Transcribe → LLM → TTS → Play

✅ Wake word detection - “Hey Jarvis” (NEW - now working!)
✅ Button trigger on GPIO 23 (reliable backup)
✅ LED feedback on GPIO 25 (visual status indication)
✅ Audio recording via direct ALSA plughw:0,0 (bypasses PipeWire)
✅ Speech-to-text via locally compiled whisper.cpp (ARM64 Pi 5 compatible)
✅ LLM via Ollama (local, offline)
✅ Text-to-speech via Piper (natural neural voice)
✅ Audio playback via AIY HAT speaker

Important Documents

wake-word-working.md - Details on the working wake word implementation
wrong-assumptions.md - Catalog of incorrect assumptions and lessons learned
This Guide - Complete setup instructions

File Structure

/userdata/voice-assistant/
├── voice_assistant_wake.py       # Main script - Wake word mode ⭐ NEW
├── voice_assistant_button.py     # Alternative - Button mode
├── whisper-cli                   # Compiled STT binary (~917KB)
├── libwhisper.so.1              # Required library (~541KB)
├── libggml.so.0                 # Required library (~48KB)
├── libggml-base.so.0            # Required library (~649KB)
├── libggml-cpu.so.0             # Required library (~767KB)
├── wake-word-working.md         # Wake word documentation
├── wrong-assumptions.md         # Lessons learned
├── models/
│   ├── hey_jarvis.onnx          # Wake word model
│   ├── ggml-base.en.bin         # Whisper model (~142MB)
│   └── en_US-amy-medium.onnx    # Piper voice (~61MB)
├── piper/
│   └── piper                    # TTS binary (~2.8MB)
└── temp/                        # Temporary audio files

Prerequisites

Raspberry Pi 5 (4GB or 8GB)
Google AIY Voice HAT v1 (with button and LED wired)
Batocera v40+ installed and running
SSH access to Pi

Step-by-Step Setup

1. Install Ollama

mkdir -p /userdata/ollama
cd /userdata/ollama
curl -L -o ollama-linux-arm64.tar.zst "https://ollama.com/download/ollama-linux-arm64.tar.zst"
tar -xf ollama-linux-arm64.tar.zst
rm ollama-linux-arm64.tar.zst

# Add to shell config
echo 'export PATH="/userdata/ollama/bin:$PATH"' >> ~/.bashrc
echo 'export OLLAMA_HOME="/userdata/ollama"' >> ~/.bashrc
source ~/.bashrc

# Start and pull model
ollama serve &
ollama pull llama3.2

2. Install Piper TTS

cd /userdata/voice-assistant
curl -L -o piper.tar.gz "https://github.com/rhasspy/piper/releases/download/v1.2.0/piper_arm64.tar.gz"
tar -xzf piper.tar.gz
mv piper_arm64/* piper/
rmdir piper_arm64
rm piper.tar.gz

3. Download Voice Model

cd /userdata/voice-assistant
mkdir -p models

curl -L -o models/en_US-amy-medium.onnx \
  "https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/amy/medium/en_US-amy-medium.onnx"

curl -L -o models/en_US-amy-medium.onnx.json \
  "https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/amy/medium/en_US-amy-medium.onnx.json"

4. Download Whisper Model

cd /userdata/voice-assistant/models
curl -L -o ggml-base.en.bin \
  "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin"

5. Download Wake Word Model (For Wake Word Mode)

cd /userdata/voice-assistant/models

# Download "Hey Jarvis" wake word model
curl -L -o hey_jarvis.onnx \
  "https://github.com/dscripka/openwakeword-models/raw/main/models/hey_jarvis.onnx"

Note: The wake word model is only needed if using voice_assistant_wake.py. The button-based voice_assistant_button.py doesn’t need this.

6. Compile whisper.cpp (CRITICAL)

On your Mac with Docker:

# Build ARM64 Linux binary
docker run --rm --platform linux/arm64 \
  -v /tmp/whisper-out:/output \
  arm64v8/ubuntu:22.04 bash -c "
    apt-get update -qq
    apt-get install -y -qq git make cmake build-essential

    cd /tmp
    git clone --depth 1 https://github.com/ggml-org/whisper.cpp.git
    cd whisper.cpp
    make -j4

    # Copy binary and ALL libraries
    cp build/bin/whisper-cli /output/
    cp build/src/libwhisper.so* /output/
    cp build/ggml/src/libggml*.so* /output/
"

# Transfer to Pi
scp /tmp/whisper-out/* root@YOUR_PI_IP:/userdata/voice-assistant/

Why compile: Pre-built binaries crash with SIGILL on Pi 5 (incompatible CPU instructions).

6. Copy Main Script

# From your Mac:
scp voice_assistant_button.py root@YOUR_PI_IP:/userdata/voice-assistant/

# On Pi:
ssh root@YOUR_PI_IP
chmod +x /userdata/voice-assistant/voice_assistant_button.py

7. Fix Shell Environment

# Add to ~/.bash_profile (Batocera uses login shells)
echo 'if [ -f ~/.bashrc ]; then source ~/.bashrc; fi' >> ~/.bash_profile

# Add to ~/.bashrc
echo 'export PATH="/userdata/ollama/bin:$PATH"' >> ~/.bashrc
echo 'export OLLAMA_HOME="/userdata/ollama"' >> ~/.bashrc

# Apply
source ~/.bashrc

Running the Assistant

You now have two working modes - choose based on your preference!

Option 1: Wake Word Mode ⭐ (Recommended)

Hands-free voice activation - just say “Hey Jarvis”

cd /userdata/voice-assistant
python3 voice_assistant_wake.py

Usage:

Wait for “Listening for ‘Hey Jarvis’…” message
Say “Hey Jarvis” clearly (you’ll see a score appear)
When you see “🎉 WAKE WORD DETECTED!”, speak your question
Wait for the assistant to respond
System returns to listening mode automatically

Tips:

Speak clearly and within 6-12 inches of the microphone
If wake word doesn’t trigger, check your audio levels first
Press Ctrl+C to exit

Option 2: Button Mode (Alternative)

Physical button activation - more reliable in noisy environments

cd /userdata/voice-assistant
python3 voice_assistant_button.py

Usage:

LED blinks 3 times (startup)
Press button on AIY HAT
LED blinks quickly (recording 5 seconds)
Speak your question
LED blinks (processing)
Assistant speaks response

Which Mode to Choose?

Feature	Wake Word	Button
Hands-free	✅ Yes	❌ No
Reliability	Good*	Excellent
Speed	Instant	Requires press
Best for	Quiet environments	Noisy environments

*Wake word works well in most conditions but may occasionally miss in very noisy environments or if speech is unclear.

Troubleshooting

“Device or resource busy” Error

# Kill stuck Python processes
pkill -9 -f 'python.*button'
pkill -9 -f 'python.*voice'

# Verify audio device is free
lsof /dev/snd/pcmC0D0c

No Speech Detected

Test microphone independently:

# Record 3 seconds
arecord -D plughw:0,0 -f S16_LE -r 16000 -c 1 -d 3 /tmp/test.wav

# Play back
aplay /tmp/test.wav

# If you hear your voice, mic is working

whisper-cli “error while loading shared libraries”

Ensure all .so files are present:

ls -la /userdata/voice-assistant/*.so*

Should show:

libwhisper.so.1
libggml.so.0
libggml-base.so.0
libggml-cpu.so.0

“Host is down” Recording Error

This means PipeWire is blocking the device. Use plughw:0,0 not default.

Check if PipeWire is running:

ps aux | grep pipewire
# If running, you may need to restart or use different approach

LED/Button Not Working

Verify GPIO access:

# Test LED
gpioset gpiochip0 25=1  # LED on
gpioset gpiochip0 25=0  # LED off

# Test button (press and hold, then run)
gpioget gpiochip0 23  # Should return 0 when pressed

Architecture

┌─────────────┐     ┌──────────────┐     ┌──────────────┐
│   Button    │────▶│   Record     │────▶│  Transcribe  │
│  GPIO 23    │     │  arecord     │     │ whisper-cli  │
└─────────────┘     │ plughw:0,0   │     │ + libraries  │
     │              └──────────────┘     └──────┬───────┘
     │                                           │
     │              ┌──────────────┐          │
     └─────────────▶│     LED      │◀─────────┘
                    │   GPIO 25    │    (status feedback)
                    └──────────────┘

        ┌──────────────┐     ┌──────────────┐     ┌─────────────┐
        │     LLM      │────▶│     TTS      │────▶│    Play     │
        │    Ollama    │     │    Piper     │     │   aplay     │
        │  llama3.2    │     │ + voice.onnx│     │  AIY HAT    │
        └──────────────┘     └──────────────┘     └─────────────┘

Key Technical Details

Audio Device Selection

plughw:0,0 (Direct ALSA) - ✅ WORKS

Bypasses PipeWire
No rate conversion overhead
Reliable, no “Host is down” errors

default (PipeWire) - ❌ FAILS

PipeWire blocks device
“Host is down” errors
Conflicts with other audio

Why Wake Word Initially Failed (And How We Fixed It)

The Problem: Initially, OpenWakeWord returned ~0.000 scores for ALL audio input, appearing incompatible with AIY HAT.

The Solution: After reverse-engineering be-more-agent’s working implementation, we identified three critical fixes:

Audio format: Changed from float32 to int16
Resampling: Changed from simple [::3] to scipy.signal.resample()
Score checking: Changed from immediate predict() to prediction_buffer

Result: Wake word now achieves 0.5-0.95 detection scores consistently!

See wake-word-working.md for complete technical details.

voice_assistant_wake.py - Wake word mode (hands-free)
voice_assistant_button.py - Button mode (GPIO trigger)

Binaries (Compile or Download)

whisper-cli (~917KB) - Speech recognition
piper/piper (~2.8MB) - Text-to-speech

Libraries (Compile with whisper.cpp)

libwhisper.so.1 (~541KB)
libggml.so.0 (~48KB)
libggml-base.so.0 (~649KB)
libggml-cpu.so.0 (~767KB)

Models (Download)

models/hey_jarvis.onnx (~??MB) - Wake word model
models/ggml-base.en.bin (~142MB) - Whisper speech model
models/en_US-amy-medium.onnx (~61MB) - Piper voice model

Comparison: Wake Word vs Button

Feature	Wake Word	Button
Status	✅ Fully working	✅ Fully working
Reliability	90%+ detection	100% (physical)
Hands-free	✅ Yes	❌ No
Best for	Quiet environments	Noisy environments
Latency	~200ms detection	~100ms detection
User experience	Natural, conversational	Intentional, tactile
Implementation	ML model + GPIO	Simple GPIO only

Recommendation: Use wake word mode for most situations. Switch to button mode if you’re in a noisy environment.

Status

✅ FULLY WORKING - March 10, 2026

Tested on: Raspberry Pi 5 8GB
OS: Batocera v40
Hardware: Google AIY Voice HAT v1
Wake word: Working (0.5-0.95 detection scores)
Button: Working (100% reliable)

Next Steps (Optional)

Customize wake word - Train your own OpenWakeWord model for different phrases
Multiple wake words - Add support for different activation phrases
Custom voice - Try different Piper voice models
VAD integration - Add Voice Activity Detection to improve recording
Batocera integration - Create voice commands to launch games
Different LLM models - Experiment with other Ollama models (codellama, mistral, etc.)

Both modes are functional and working reliably on the test hardware!

Resources

Ollama - Local LLM runtime
whisper.cpp - Speech recognition
Piper - Neural TTS
OpenWakeWord - Wake word detection (now working!)
AIY Projects - Voice HAT documentation

License

MIT License - See LICENSE file for details.

Created: March 10, 2026 Last tested: Batocera v40, Raspberry Pi 5, AIY Voice HAT v1

Keyboard shortcuts

Botface Voice Assistant Documentation