AIY Voice Assistant - Project Summary

Mission Status

The voice-controlled AI assistant for Raspberry Pi 5 + Google AIY Voice HAT v1 + Batocera has functional wake word and button activation via two separate scripts.

What We Built

Two Working Voice Assistants

Wake Word Mode (voice_assistant_wake.py)
- Say “Hey Jarvis” to activate
- Hands-free operation
- Scores: 0.5-0.95 detection confidence
- Continuous listening after each interaction
Button Mode (voice_assistant_button.py)
- Press GPIO 23 button to activate
- LED feedback on GPIO 25
- More reliable in noisy environments
- Always available as backup

Complete Pipeline (Both Modes)

Trigger → Record (arecord) → Transcribe (whisper.cpp) → LLM (Ollama) → TTS (Piper) → Play (aplay)

📚 Documentation Created

Document	Purpose
setup-guide.md	Complete setup and installation instructions
wake-word-working.md	Details on the wake word implementation
wrong-assumptions.md	Catalog of incorrect assumptions and fixes

All located in /userdata/voice-assistant/ on your Pi.

🔑 Key Technical Achievements

What Made Wake Word Work

After ~23 hours of debugging, we identified these critical fixes:

Problem	Wrong Assumption	Correct Reality
Resampling	`audio[::3]` simple downsampling	`scipy.signal.resample()` with interpolation
Audio format	`float32` more precise	`int16` (model trained on this)
Score checking	Immediate `predict()` result	`prediction_buffer` (accumulated)
Device access	ALSA `default` device	`plughw:0,0` (bypasses PipeWire)
Binary compatibility	Pre-built binaries work	Must compile for Pi 5 ARM64
Libraries	Only need main binary	Need all .so files

Why Previous Attempts Failed

The wake word detection went from ~0.000 scores to 0.5-0.95 by fixing:

Audio format (float32 → int16)
Proper resampling (scipy.signal.resample)
Checking prediction_buffer instead of immediate result

Quick Start Commands

Wake Word Mode:

cd /userdata/voice-assistant
python3 voice_assistant_wake.py

Button Mode:

cd /userdata/voice-assistant
python3 voice_assistant_button.py

Auto-start on Boot (Already Enabled):

# Check service status
batocera-services list

# The voice assistant now starts automatically at boot!
# View the log:
tail -f /tmp/voice-assistant.log

See docs/service-setup.md for complete service documentation.

Test Results

============================================================
AIY Voice HAT - Wake Word Assistant (Working!)
============================================================

Loading wake word model...
✓ Model loaded
Threshold: 0.5

Hardware: 48000Hz → Model: 16000Hz
Resampling: YES

============================================================
👂 Listening for 'Hey Jarvis'... (activation #1)

[Wake Word Score: 0.878 [==============================]

🎉 WAKE WORD DETECTED! (score: 0.878)
🎤 Recording 5 seconds...
📝 Transcribing...
👤 You: What is the weather like?
🤔 Thinking...
🤖 Assistant: I don't have access to real-time weather data, but I can help you understand weather patterns or discuss general climate information. Would you like to know about how weather forecasting works?

Architecture

Wake Word Flow

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│  "Hey Jarvis" │────▶│ SoundDevice  │────▶│   Resample   │
│   (User says) │     │  plughw:0,0  │     │ scipy.signal │
└──────────────┘     │   48000Hz    │     │  48000→16000 │
                     └──────────────┘     └──────┬───────┘
                                                   │
┌──────────────┐     ┌──────────────┐             │
│  Reset &     │◀────│  Check       │◀────────────┘
│  Process     │     │  prediction_ │
│  Command     │     │  buffer      │
└──────┬───────┘     └──────────────┘
       │
       ▼
┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│   Record     │────▶│  Transcribe  │────▶│     LLM      │
│  arecord     │     │ whisper-cli  │     │    Ollama    │
│ plughw:0,0   │     │ + libraries  │     │  llama3.2    │
└──────────────┘     └──────────────┘     └──────┬───────┘
                                                │
                       ┌──────────────┐         │
                       │    Play      │◀─────────┘
                       │   aplay      │    (speak
                       │  AIY HAT     │     response)
                       └──────────────┘

Button Flow

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│Button Press  │────▶│ LED Blink    │────▶│   Record     │
│  GPIO 23     │     │  GPIO 25     │     │  arecord     │
└──────────────┘     └──────────────┘     │ plughw:0,0   │
                                          └──────┬───────┘
                                                 │
       ┌─────────────────────────────────────────┘
       │
       ▼
┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│  Transcribe  │────▶│     LLM      │────▶│    Play      │
│ whisper-cli  │     │    Ollama    │     │   aplay      │
│ + libraries  │     │  llama3.2    │     │  AIY HAT     │
└──────────────┘     └──────────────┘     └──────────────┘

What You Can Do Now

Use it immediately - Both modes are ready to go
Customize the wake word - Train your own OpenWakeWord model
Add more features - Multiple wake words, different LLM models
Integrate with Batocera - Launch games via voice command
Create custom responses - Personalized assistant personality

📖 Read the Documentation!

wrong-assumptions.md - Learn from our mistakes (highly recommended)
wake-word-working.md - Deep dive into the wake word solution
setup-guide.md - Complete setup for new installations

🎓 Lessons Learned

Never assume - Test every assumption about audio, models, and hardware
Find working examples - be-more-agent had the answers we needed
Audio quality matters - Proper resampling and format are critical
Documentation lies - Read the source code when things don’t work
Hardware is rarely broken - Software/configuration issues are more common

🏆 Final Status

PROJECT STATUS: ✅ COMPLETE AND WORKING

Both wake word and button activation are fully functional and ready for daily use. The assistant runs entirely offline with local STT, LLM, and TTS.

Total development time: ~25 hours Major breakthrough: Wake word detection (was the hardest part) Lines of code: ~500 across all implementations Documentation: ~1000 lines across 3 comprehensive guides

🙏 Credits & Acknowledgments

Wake word implementation inspired by be-more-agent by Brendan Polyak.

The working wake word detection approach was adapted from studying be-more-agent’s audio processing methodology, which helped identify:

The importance of int16 audio format (not float32)
Proper resampling with scipy.signal.resample (not simple downsampling)
Checking prediction_buffer instead of immediate prediction results

Thank you to the open source community for making local AI accessible!

Enjoy your fully offline, voice-controlled AI assistant! 🤖🎙️

Say “Hey Jarvis” or press the button to start talking to your AI.

Keyboard shortcuts

Botface Voice Assistant Documentation