Keyboard shortcuts

Press ← or β†’ to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

πŸŽ‰ BREAKTHROUGH: Wake Word Detection Now Working!

Summary

After extensive debugging and reverse-engineering be-more-agent’s working implementation, wake word detection is now fully functional on the Raspberry Pi 5 + Google AIY Voice HAT v1!

Key Fixes (What Made It Work)

The original wake word implementation failed with scores ~0.000. The corrected version achieves scores of 0.5-0.95. Here’s what was wrong and what fixed it:

❌ Original Approach (Failed)

# WRONG: Simple downsampling
audio_data = audio_data[::3]  # Destroys audio quality!

# WRONG: float32 format
audio_data = np.frombuffer(indata, dtype=np.float32)

# WRONG: Checking immediate prediction
prediction = oww_model.predict(audio_data)
if prediction > threshold:  # Always ~0.000

βœ… Corrected Approach (Working!)

# CORRECT: Proper resampling with scipy
from scipy import signal
audio_data = signal.resample(audio_data, CHUNK_SIZE).astype(np.int16)

# CORRECT: int16 format (matches model expectations)
audio_data = np.frombuffer(indata, dtype=np.int16).flatten()

# CORRECT: Check prediction_buffer (accumulated predictions)
oww_model.predict(audio_data)  # Just updates the buffer
for mdl in oww_model.prediction_buffer.keys():
    score = list(oww_model.prediction_buffer[mdl])[-1]
    if score > WAKE_WORD_THRESHOLD:  # Now works!

The Critical Differences

AspectOriginal (Broken)Corrected (Working)
ResamplingSimple [::3] downsamplingscipy.signal.resample() with interpolation
Data Typefloat32int16
Score CheckImmediate prediction resultprediction_buffer (accumulated history)
Typical Scores~0.0000.5-0.95

Working Files

Production Wake Word Assistant

  • voice_assistant_wake.py - Continuous wake word detection
    • Listens for β€œHey Jarvis”
    • Records command after detection
    • Transcribes with whisper.cpp
    • Gets LLM response from Ollama
    • Speaks response via Piper TTS
    • Returns to listening mode

Button-Based Alternative (Still Available)

  • voice_assistant_button.py - Physical button trigger on GPIO 23
    • More reliable in noisy environments
    • Use this if wake word is inconsistent

Test Results

πŸ‘‚ Listening for 'Hey Jarvis'... (activation #1)
[Wake Word Score: 0.878] [==============================]
πŸŽ‰ WAKE WORD DETECTED! (score: 0.878)
🎀 Recording 5 seconds...
πŸ“ Transcribing...
πŸ‘€ You: Hey Jarvis.
πŸ€” Thinking...
πŸ€– Assistant: Hello! How can I help you today?

Usage

Start Wake Word Assistant

cd /userdata/voice-assistant
python3 voice_assistant_wake.py

Start Button Assistant (Alternative)

cd /userdata/voice-assistant
python3 voice_assistant_button.py

Run at Boot (Systemd Service)

# Create service file
cat > /tmp/voice-assistant.service << 'EOF'
[Unit]
Description=AIY Voice Assistant
After=network.target ollama.service

[Service]
Type=simple
WorkingDirectory=/userdata/voice-assistant
Environment=LD_LIBRARY_PATH=/userdata/voice-assistant
Environment=PYTHONPATH=/userdata/voice-assistant/lib
ExecStart=/usr/bin/python3 /userdata/voice-assistant/voice_assistant_wake.py
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

# Install and enable
systemctl enable /tmp/voice-assistant.service
systemctl start voice-assistant

Technical Details

Why These Changes Matter

  1. Proper Resampling: Simple downsampling [::3] throws away 2/3 of the audio data and causes aliasing. scipy.signal.resample() uses proper interpolation to create a clean 16kHz signal from 48kHz hardware.

  2. int16 Format: The wake word model was trained on int16 audio. Using float32 changes the amplitude scaling, confusing the model.

  3. prediction_buffer: OpenWakeWord uses a sliding window of predictions, not instantaneous results. Checking the buffer gives accumulated confidence over multiple audio chunks.

Audio Pipeline

AIY HAT (48kHz) β†’ SoundDevice β†’ scipy.signal.resample β†’ int16 β†’ OpenWakeWord (16kHz)
                                    ↓
                              [Wake Word Detected]
                                    ↓
                           arecord (16kHz) β†’ whisper.cpp β†’ Ollama β†’ Piper β†’ aplay

Next Steps

  1. βœ… DONE: Wake word detection working
  2. βœ… DONE: Recording working
  3. βœ… DONE: Transcription working
  4. βœ… DONE: LLM integration working
  5. βœ… DONE: TTS working

Optional Enhancements

  • Add multiple wake word models
  • Implement confidence threshold adjustment
  • Add LED feedback during listening
  • Create custom wake word models

Troubleshooting

Wake Word Not Detected

  • Speak clearly and close to the microphone
  • Check audio levels: python3 check_levels.py
  • Try adjusting threshold: WAKE_WORD_THRESHOLD = 0.4 (lower = more sensitive)

Recording Fails

  • Ensure no other process is using the audio device
  • Check ALSA device: arecord -D plughw:0,0 -t wav -d 3 /tmp/test.wav

Transcription Issues

  • Verify whisper.cpp binary is compiled for Pi 5 (ARM64)
  • Check model file exists: ls -la models/ggml-base.en.bin

Conclusion

The wake word voice assistant is now fully functional!

Both options are available:

  • Wake Word: Hands-free, natural interaction
  • Button: More reliable, explicit control

Choose based on your preference and environment.