AIY Voice Assistant - Project Summary
Mission Status
The voice-controlled AI assistant for Raspberry Pi 5 + Google AIY Voice HAT v1 + Batocera has functional wake word and button activation via two separate scripts.
What We Built
Two Working Voice Assistants
-
Wake Word Mode (
voice_assistant_wake.py)- Say “Hey Jarvis” to activate
- Hands-free operation
- Scores: 0.5-0.95 detection confidence
- Continuous listening after each interaction
-
Button Mode (
voice_assistant_button.py)- Press GPIO 23 button to activate
- LED feedback on GPIO 25
- More reliable in noisy environments
- Always available as backup
Complete Pipeline (Both Modes)
Trigger → Record (arecord) → Transcribe (whisper.cpp) → LLM (Ollama) → TTS (Piper) → Play (aplay)
📚 Documentation Created
| Document | Purpose |
|---|---|
| setup-guide.md | Complete setup and installation instructions |
| wake-word-working.md | Details on the wake word implementation |
| wrong-assumptions.md | Catalog of incorrect assumptions and fixes |
All located in /userdata/voice-assistant/ on your Pi.
🔑 Key Technical Achievements
What Made Wake Word Work
After ~23 hours of debugging, we identified these critical fixes:
| Problem | Wrong Assumption | Correct Reality |
|---|---|---|
| Resampling | audio[::3] simple downsampling | scipy.signal.resample() with interpolation |
| Audio format | float32 more precise | int16 (model trained on this) |
| Score checking | Immediate predict() result | prediction_buffer (accumulated) |
| Device access | ALSA default device | plughw:0,0 (bypasses PipeWire) |
| Binary compatibility | Pre-built binaries work | Must compile for Pi 5 ARM64 |
| Libraries | Only need main binary | Need all .so files |
Why Previous Attempts Failed
The wake word detection went from ~0.000 scores to 0.5-0.95 by fixing:
- Audio format (float32 → int16)
- Proper resampling (scipy.signal.resample)
- Checking prediction_buffer instead of immediate result
Quick Start Commands
Wake Word Mode:
cd /userdata/voice-assistant
python3 voice_assistant_wake.py
Button Mode:
cd /userdata/voice-assistant
python3 voice_assistant_button.py
Auto-start on Boot (Already Enabled):
# Check service status
batocera-services list
# The voice assistant now starts automatically at boot!
# View the log:
tail -f /tmp/voice-assistant.log
See docs/service-setup.md for complete service documentation.
Test Results
============================================================
AIY Voice HAT - Wake Word Assistant (Working!)
============================================================
Loading wake word model...
✓ Model loaded
Threshold: 0.5
Hardware: 48000Hz → Model: 16000Hz
Resampling: YES
============================================================
👂 Listening for 'Hey Jarvis'... (activation #1)
[Wake Word Score: 0.878 [==============================]
🎉 WAKE WORD DETECTED! (score: 0.878)
🎤 Recording 5 seconds...
📝 Transcribing...
👤 You: What is the weather like?
🤔 Thinking...
🤖 Assistant: I don't have access to real-time weather data, but I can help you understand weather patterns or discuss general climate information. Would you like to know about how weather forecasting works?
Architecture
Wake Word Flow
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ "Hey Jarvis" │────▶│ SoundDevice │────▶│ Resample │
│ (User says) │ │ plughw:0,0 │ │ scipy.signal │
└──────────────┘ │ 48000Hz │ │ 48000→16000 │
└──────────────┘ └──────┬───────┘
│
┌──────────────┐ ┌──────────────┐ │
│ Reset & │◀────│ Check │◀────────────┘
│ Process │ │ prediction_ │
│ Command │ │ buffer │
└──────┬───────┘ └──────────────┘
│
▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Record │────▶│ Transcribe │────▶│ LLM │
│ arecord │ │ whisper-cli │ │ Ollama │
│ plughw:0,0 │ │ + libraries │ │ llama3.2 │
└──────────────┘ └──────────────┘ └──────┬───────┘
│
┌──────────────┐ │
│ Play │◀─────────┘
│ aplay │ (speak
│ AIY HAT │ response)
└──────────────┘
Button Flow
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│Button Press │────▶│ LED Blink │────▶│ Record │
│ GPIO 23 │ │ GPIO 25 │ │ arecord │
└──────────────┘ └──────────────┘ │ plughw:0,0 │
└──────┬───────┘
│
┌─────────────────────────────────────────┘
│
▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Transcribe │────▶│ LLM │────▶│ Play │
│ whisper-cli │ │ Ollama │ │ aplay │
│ + libraries │ │ llama3.2 │ │ AIY HAT │
└──────────────┘ └──────────────┘ └──────────────┘
What You Can Do Now
- Use it immediately - Both modes are ready to go
- Customize the wake word - Train your own OpenWakeWord model
- Add more features - Multiple wake words, different LLM models
- Integrate with Batocera - Launch games via voice command
- Create custom responses - Personalized assistant personality
📖 Read the Documentation!
- wrong-assumptions.md - Learn from our mistakes (highly recommended)
- wake-word-working.md - Deep dive into the wake word solution
- setup-guide.md - Complete setup for new installations
🎓 Lessons Learned
- Never assume - Test every assumption about audio, models, and hardware
- Find working examples - be-more-agent had the answers we needed
- Audio quality matters - Proper resampling and format are critical
- Documentation lies - Read the source code when things don’t work
- Hardware is rarely broken - Software/configuration issues are more common
🏆 Final Status
PROJECT STATUS: ✅ COMPLETE AND WORKING
Both wake word and button activation are fully functional and ready for daily use. The assistant runs entirely offline with local STT, LLM, and TTS.
Total development time: ~25 hours Major breakthrough: Wake word detection (was the hardest part) Lines of code: ~500 across all implementations Documentation: ~1000 lines across 3 comprehensive guides
🙏 Credits & Acknowledgments
Wake word implementation inspired by be-more-agent by Brendan Polyak.
The working wake word detection approach was adapted from studying be-more-agent’s audio processing methodology, which helped identify:
- The importance of int16 audio format (not float32)
- Proper resampling with scipy.signal.resample (not simple downsampling)
- Checking prediction_buffer instead of immediate prediction results
Thank you to the open source community for making local AI accessible!
Enjoy your fully offline, voice-controlled AI assistant! 🤖🎙️
Say “Hey Jarvis” or press the button to start talking to your AI.