Speech Recognizer And Voice Activity Detector

This chip performs automatic speech recognition (ASR) and voice activity detection (VAD). ASR accuracy and memory efficiency are enhanced by the use of compressed neural network acoustic models and a variety of modeling and search techniques, allowing real-time decoding with around 10 MB/s external memory bandwidth. ASR models can be imported after training with open-source tools (Kaldi). We evaluated tasks with vocabulary sizes from 11 words (172 uW) to 145k words (7.78 mW); accuracy is comparable to the equivalent Kaldi software recognizer. VAD is used to enable voice-activated power gating of the ASR and downstream system. We include three VAD algorithms to investigate tradeoffs between performance and power consumption. The modulation frequency algorithm is the most robust to difficult noise environments and consumes 22.3 uW.