Neuromorphic Keyword Spotting with Pulse Density Modulation MEMS Microphones
Sidi Yaya Arnaud Yarga, Sean U. N. Wood
TL;DR
This work addresses energy-efficient KWS by removing intermediate preprocessing and connecting a Pulse Density Modulation (PDM) microphone directly to a Spiking Neural Network (SNN). The authors design a 5-layer 1D convolutional SNN using ParaLIF-D neurons, train with surrogate gradients, and validate on Google Speech Commands with a PCM-to-PDM parallelization to speed up data preparation. They achieve 91.54% accuracy on GSC, surpassing state-of-the-art spike-encoded baselines, and demonstrate extreme network sparsity and scalable oversampling that imply substantial energy savings on neuromorphic hardware. The results suggest a viable pathway for ultra-low-energy, end-to-end neuromorphic KWS and motivate future hardware demonstrations and power measurements.
Abstract
The Keyword Spotting (KWS) task involves continuous audio stream monitoring to detect predefined words, requiring low energy devices for continuous processing. Neuromorphic devices effectively address this energy challenge. However, the general neuromorphic KWS pipeline, from microphone to Spiking Neural Network (SNN), entails multiple processing stages. Leveraging the popularity of Pulse Density Modulation (PDM) microphones in modern devices and their similarity to spiking neurons, we propose a direct microphone-to-SNN connection. This approach eliminates intermediate stages, notably reducing computational costs. The system achieved an accuracy of 91.54\% on the Google Speech Command (GSC) dataset, surpassing the state-of-the-art for the Spiking Speech Command (SSC) dataset which is a bio-inspired encoded GSC. Furthermore, the observed sparsity in network activity and connectivity indicates potential for remarkably low energy consumption in a neuromorphic device implementation.
