ED-sKWS: Early-Decision Spiking Neural Networks for Rapid,and Energy-Efficient Keyword Spotting
Zeyang Song, Qianhui Liu, Qu Yang, Yizhou Peng, Haizhou Li
TL;DR
The paper tackles rapid, energy-efficient keyword spotting on edge devices by leveraging Spiking Neural Networks (SNNs) with an early-decision capability. It integrates a Cumulative Temporal (CT) loss to optimize predictions across timesteps, using the accumulated output $O[t]$ to guide learning, exemplified by $O[t]=\sum_{i=0}^t \mathrm{softmax}(U_R[i])$ and $L_{CT}=\frac{1}{T}\sum_{t=0}^T L_{CE}[O[t], y]$. A new SC-100 dataset with precise begin/end timestamps for 100 keywords enables accurate evaluation of early stopping and timing. Experimental results on Google Speech Commands v2 and SC-100 show competitive accuracy at reduced timesteps (about $61\%$) and significantly lower energy (about $52\%$), validating the approach for real-time, energy-conscious KWS in edge settings. The work demonstrates that early-decision SNNs, guided by CT loss, can deliver fast, reliable keyword spotting with meaningful energy savings, supported by a dedicated dataset for timing analysis.
Abstract
Keyword Spotting (KWS) is essential in edge computing requiring rapid and energy-efficient responses. Spiking Neural Networks (SNNs) are well-suited for KWS for their efficiency and temporal capacity for speech. To further reduce the latency and energy consumption, this study introduces ED-sKWS, an SNN-based KWS model with an early-decision mechanism that can stop speech processing and output the result before the end of speech utterance. Furthermore, we introduce a Cumulative Temporal (CT) loss that can enhance prediction accuracy at both the intermediate and final timesteps. To evaluate early-decision performance, we present the SC-100 dataset including 100 speech commands with beginning and end timestamp annotation. Experiments on the Google Speech Commands v2 and our SC-100 datasets show that ED-sKWS maintains competitive accuracy with 61% timesteps and 52% energy consumption compared to SNN models without early-decision mechanism, ensuring rapid response and energy efficiency.
