Table of Contents
Fetching ...

Neuromorphic Keyword Spotting with Pulse Density Modulation MEMS Microphones

Sidi Yaya Arnaud Yarga, Sean U. N. Wood

TL;DR

This work addresses energy-efficient KWS by removing intermediate preprocessing and connecting a Pulse Density Modulation (PDM) microphone directly to a Spiking Neural Network (SNN). The authors design a 5-layer 1D convolutional SNN using ParaLIF-D neurons, train with surrogate gradients, and validate on Google Speech Commands with a PCM-to-PDM parallelization to speed up data preparation. They achieve 91.54% accuracy on GSC, surpassing state-of-the-art spike-encoded baselines, and demonstrate extreme network sparsity and scalable oversampling that imply substantial energy savings on neuromorphic hardware. The results suggest a viable pathway for ultra-low-energy, end-to-end neuromorphic KWS and motivate future hardware demonstrations and power measurements.

Abstract

The Keyword Spotting (KWS) task involves continuous audio stream monitoring to detect predefined words, requiring low energy devices for continuous processing. Neuromorphic devices effectively address this energy challenge. However, the general neuromorphic KWS pipeline, from microphone to Spiking Neural Network (SNN), entails multiple processing stages. Leveraging the popularity of Pulse Density Modulation (PDM) microphones in modern devices and their similarity to spiking neurons, we propose a direct microphone-to-SNN connection. This approach eliminates intermediate stages, notably reducing computational costs. The system achieved an accuracy of 91.54\% on the Google Speech Command (GSC) dataset, surpassing the state-of-the-art for the Spiking Speech Command (SSC) dataset which is a bio-inspired encoded GSC. Furthermore, the observed sparsity in network activity and connectivity indicates potential for remarkably low energy consumption in a neuromorphic device implementation.

Neuromorphic Keyword Spotting with Pulse Density Modulation MEMS Microphones

TL;DR

This work addresses energy-efficient KWS by removing intermediate preprocessing and connecting a Pulse Density Modulation (PDM) microphone directly to a Spiking Neural Network (SNN). The authors design a 5-layer 1D convolutional SNN using ParaLIF-D neurons, train with surrogate gradients, and validate on Google Speech Commands with a PCM-to-PDM parallelization to speed up data preparation. They achieve 91.54% accuracy on GSC, surpassing state-of-the-art spike-encoded baselines, and demonstrate extreme network sparsity and scalable oversampling that imply substantial energy savings on neuromorphic hardware. The results suggest a viable pathway for ultra-low-energy, end-to-end neuromorphic KWS and motivate future hardware demonstrations and power measurements.

Abstract

The Keyword Spotting (KWS) task involves continuous audio stream monitoring to detect predefined words, requiring low energy devices for continuous processing. Neuromorphic devices effectively address this energy challenge. However, the general neuromorphic KWS pipeline, from microphone to Spiking Neural Network (SNN), entails multiple processing stages. Leveraging the popularity of Pulse Density Modulation (PDM) microphones in modern devices and their similarity to spiking neurons, we propose a direct microphone-to-SNN connection. This approach eliminates intermediate stages, notably reducing computational costs. The system achieved an accuracy of 91.54\% on the Google Speech Command (GSC) dataset, surpassing the state-of-the-art for the Spiking Speech Command (SSC) dataset which is a bio-inspired encoded GSC. Furthermore, the observed sparsity in network activity and connectivity indicates potential for remarkably low energy consumption in a neuromorphic device implementation.
Paper Structure (17 sections, 2 figures, 2 tables, 2 algorithms)

This paper contains 17 sections, 2 figures, 2 tables, 2 algorithms.

Figures (2)

  • Figure 1: General neuromorphic Keyword Spotting pipeline and proposed shortcut.
  • Figure 2: Oversampling and network sparsity impact on classification accuracy.