π BREAKTHROUGH: Wake Word Detection Now Working!
Summary
After extensive debugging and reverse-engineering be-more-agentβs working implementation, wake word detection is now fully functional on the Raspberry Pi 5 + Google AIY Voice HAT v1!
Key Fixes (What Made It Work)
The original wake word implementation failed with scores ~0.000. The corrected version achieves scores of 0.5-0.95. Hereβs what was wrong and what fixed it:
β Original Approach (Failed)
# WRONG: Simple downsampling
audio_data = audio_data[::3] # Destroys audio quality!
# WRONG: float32 format
audio_data = np.frombuffer(indata, dtype=np.float32)
# WRONG: Checking immediate prediction
prediction = oww_model.predict(audio_data)
if prediction > threshold: # Always ~0.000
β Corrected Approach (Working!)
# CORRECT: Proper resampling with scipy
from scipy import signal
audio_data = signal.resample(audio_data, CHUNK_SIZE).astype(np.int16)
# CORRECT: int16 format (matches model expectations)
audio_data = np.frombuffer(indata, dtype=np.int16).flatten()
# CORRECT: Check prediction_buffer (accumulated predictions)
oww_model.predict(audio_data) # Just updates the buffer
for mdl in oww_model.prediction_buffer.keys():
score = list(oww_model.prediction_buffer[mdl])[-1]
if score > WAKE_WORD_THRESHOLD: # Now works!
The Critical Differences
| Aspect | Original (Broken) | Corrected (Working) |
|---|---|---|
| Resampling | Simple [::3] downsampling | scipy.signal.resample() with interpolation |
| Data Type | float32 | int16 |
| Score Check | Immediate prediction result | prediction_buffer (accumulated history) |
| Typical Scores | ~0.000 | 0.5-0.95 |
Working Files
Production Wake Word Assistant
voice_assistant_wake.py- Continuous wake word detection- Listens for βHey Jarvisβ
- Records command after detection
- Transcribes with whisper.cpp
- Gets LLM response from Ollama
- Speaks response via Piper TTS
- Returns to listening mode
Button-Based Alternative (Still Available)
voice_assistant_button.py- Physical button trigger on GPIO 23- More reliable in noisy environments
- Use this if wake word is inconsistent
Test Results
π Listening for 'Hey Jarvis'... (activation #1)
[Wake Word Score: 0.878] [==============================]
π WAKE WORD DETECTED! (score: 0.878)
π€ Recording 5 seconds...
π Transcribing...
π€ You: Hey Jarvis.
π€ Thinking...
π€ Assistant: Hello! How can I help you today?
Usage
Start Wake Word Assistant
cd /userdata/voice-assistant
python3 voice_assistant_wake.py
Start Button Assistant (Alternative)
cd /userdata/voice-assistant
python3 voice_assistant_button.py
Run at Boot (Systemd Service)
# Create service file
cat > /tmp/voice-assistant.service << 'EOF'
[Unit]
Description=AIY Voice Assistant
After=network.target ollama.service
[Service]
Type=simple
WorkingDirectory=/userdata/voice-assistant
Environment=LD_LIBRARY_PATH=/userdata/voice-assistant
Environment=PYTHONPATH=/userdata/voice-assistant/lib
ExecStart=/usr/bin/python3 /userdata/voice-assistant/voice_assistant_wake.py
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
# Install and enable
systemctl enable /tmp/voice-assistant.service
systemctl start voice-assistant
Technical Details
Why These Changes Matter
-
Proper Resampling: Simple downsampling
[::3]throws away 2/3 of the audio data and causes aliasing.scipy.signal.resample()uses proper interpolation to create a clean 16kHz signal from 48kHz hardware. -
int16 Format: The wake word model was trained on int16 audio. Using float32 changes the amplitude scaling, confusing the model.
-
prediction_buffer: OpenWakeWord uses a sliding window of predictions, not instantaneous results. Checking the buffer gives accumulated confidence over multiple audio chunks.
Audio Pipeline
AIY HAT (48kHz) β SoundDevice β scipy.signal.resample β int16 β OpenWakeWord (16kHz)
β
[Wake Word Detected]
β
arecord (16kHz) β whisper.cpp β Ollama β Piper β aplay
Next Steps
- β DONE: Wake word detection working
- β DONE: Recording working
- β DONE: Transcription working
- β DONE: LLM integration working
- β DONE: TTS working
Optional Enhancements
- Add multiple wake word models
- Implement confidence threshold adjustment
- Add LED feedback during listening
- Create custom wake word models
Troubleshooting
Wake Word Not Detected
- Speak clearly and close to the microphone
- Check audio levels:
python3 check_levels.py - Try adjusting threshold:
WAKE_WORD_THRESHOLD = 0.4(lower = more sensitive)
Recording Fails
- Ensure no other process is using the audio device
- Check ALSA device:
arecord -D plughw:0,0 -t wav -d 3 /tmp/test.wav
Transcription Issues
- Verify whisper.cpp binary is compiled for Pi 5 (ARM64)
- Check model file exists:
ls -la models/ggml-base.en.bin
Conclusion
The wake word voice assistant is now fully functional!
Both options are available:
- Wake Word: Hands-free, natural interaction
- Button: More reliable, explicit control
Choose based on your preference and environment.