🎉 BREAKTHROUGH: Wake Word Detection Now Working!

Summary

After extensive debugging and reverse-engineering be-more-agent’s working implementation, wake word detection is now fully functional on the Raspberry Pi 5 + Google AIY Voice HAT v1!

Key Fixes (What Made It Work)

The original wake word implementation failed with scores ~0.000. The corrected version achieves scores of 0.5-0.95. Here’s what was wrong and what fixed it:

❌ Original Approach (Failed)

# WRONG: Simple downsampling
audio_data = audio_data[::3]  # Destroys audio quality!

# WRONG: float32 format
audio_data = np.frombuffer(indata, dtype=np.float32)

# WRONG: Checking immediate prediction
prediction = oww_model.predict(audio_data)
if prediction > threshold:  # Always ~0.000

✅ Corrected Approach (Working!)

# CORRECT: Proper resampling with scipy
from scipy import signal
audio_data = signal.resample(audio_data, CHUNK_SIZE).astype(np.int16)

# CORRECT: int16 format (matches model expectations)
audio_data = np.frombuffer(indata, dtype=np.int16).flatten()

# CORRECT: Check prediction_buffer (accumulated predictions)
oww_model.predict(audio_data)  # Just updates the buffer
for mdl in oww_model.prediction_buffer.keys():
    score = list(oww_model.prediction_buffer[mdl])[-1]
    if score > WAKE_WORD_THRESHOLD:  # Now works!

The Critical Differences

Aspect	Original (Broken)	Corrected (Working)
Resampling	Simple `[::3]` downsampling	`scipy.signal.resample()` with interpolation
Data Type	`float32`	`int16`
Score Check	Immediate prediction result	`prediction_buffer` (accumulated history)
Typical Scores	~0.000	0.5-0.95

Working Files

Production Wake Word Assistant

voice_assistant_wake.py - Continuous wake word detection
- Listens for “Hey Jarvis”
- Records command after detection
- Transcribes with whisper.cpp
- Gets LLM response from Ollama
- Speaks response via Piper TTS
- Returns to listening mode

Button-Based Alternative (Still Available)

voice_assistant_button.py - Physical button trigger on GPIO 23
- More reliable in noisy environments
- Use this if wake word is inconsistent

Test Results

👂 Listening for 'Hey Jarvis'... (activation #1)
[Wake Word Score: 0.878] [==============================]
🎉 WAKE WORD DETECTED! (score: 0.878)
🎤 Recording 5 seconds...
📝 Transcribing...
👤 You: Hey Jarvis.
🤔 Thinking...
🤖 Assistant: Hello! How can I help you today?

Usage

Start Wake Word Assistant

cd /userdata/voice-assistant
python3 voice_assistant_wake.py

Start Button Assistant (Alternative)

cd /userdata/voice-assistant
python3 voice_assistant_button.py

Run at Boot (Systemd Service)

# Create service file
cat > /tmp/voice-assistant.service << 'EOF'
[Unit]
Description=AIY Voice Assistant
After=network.target ollama.service

[Service]
Type=simple
WorkingDirectory=/userdata/voice-assistant
Environment=LD_LIBRARY_PATH=/userdata/voice-assistant
Environment=PYTHONPATH=/userdata/voice-assistant/lib
ExecStart=/usr/bin/python3 /userdata/voice-assistant/voice_assistant_wake.py
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

# Install and enable
systemctl enable /tmp/voice-assistant.service
systemctl start voice-assistant

Technical Details

Why These Changes Matter

Proper Resampling: Simple downsampling [::3] throws away 2/3 of the audio data and causes aliasing. scipy.signal.resample() uses proper interpolation to create a clean 16kHz signal from 48kHz hardware.
int16 Format: The wake word model was trained on int16 audio. Using float32 changes the amplitude scaling, confusing the model.
prediction_buffer: OpenWakeWord uses a sliding window of predictions, not instantaneous results. Checking the buffer gives accumulated confidence over multiple audio chunks.

Audio Pipeline

AIY HAT (48kHz) → SoundDevice → scipy.signal.resample → int16 → OpenWakeWord (16kHz)
                                    ↓
                              [Wake Word Detected]
                                    ↓
                           arecord (16kHz) → whisper.cpp → Ollama → Piper → aplay

Next Steps

✅ DONE: Wake word detection working
✅ DONE: Recording working
✅ DONE: Transcription working
✅ DONE: LLM integration working
✅ DONE: TTS working

Optional Enhancements

Add multiple wake word models
Implement confidence threshold adjustment
Add LED feedback during listening
Create custom wake word models

Troubleshooting

Wake Word Not Detected

Speak clearly and close to the microphone
Check audio levels: python3 check_levels.py
Try adjusting threshold: WAKE_WORD_THRESHOLD = 0.4 (lower = more sensitive)

Recording Fails

Ensure no other process is using the audio device
Check ALSA device: arecord -D plughw:0,0 -t wav -d 3 /tmp/test.wav

Transcription Issues

Verify whisper.cpp binary is compiled for Pi 5 (ARM64)
Check model file exists: ls -la models/ggml-base.en.bin

Conclusion

The wake word voice assistant is now fully functional!

Both options are available:

Wake Word: Hands-free, natural interaction
Button: More reliable, explicit control

Choose based on your preference and environment.

Keyboard shortcuts

Botface Voice Assistant Documentation