Spike Encoding for Environmental Sound: A Comparative Benchmark
Andres Larroza, Javier Naranjo-Alcazar, Vicent Ortiz, Maximo Cobos, Pedro Zuccarello
TL;DR
This paper benchmarks three spike-encoding schemes—Threshold Adaptive Encoding (TAE), Step Forward (SF), and Moving Window (MW)—for environmental sound processing with Spiking Neural Networks (SNNs). Across three datasets (ESC-10, UrbanSound8K, TAU-3Class) and a multi-band Mel-spectrogram representation, TAE delivers superior signal reconstruction quality and the lowest spike firing rates, indicating enhanced energy efficiency. In downstream SNN classification, TAE achieves the highest accuracy on two datasets, though all encoders lag behind the original baselines, highlighting the need for co-design of encoders and architectures. Overall, the work provides a foundational benchmark guiding spike-encoder selection for neuromorphic environmental audio, with implications for edge-deployed, energy-conscious audio systems and potential integration with attention-based SNNs to bridge remaining gaps.
Abstract
Spiking Neural Networks (SNNs) offer energy efficient processing suitable for edge applications, but conventional sensor data must first be converted into spike trains for neuromorphic processing. Environmental sound, including urban soundscapes, poses challenges due to variable frequencies, background noise, and overlapping acoustic events, while most spike based audio encoding research has focused on speech. This paper analyzes three spike encoding methods, Threshold Adaptive Encoding (TAE), Step Forward (SF), and Moving Window (MW) across three datasets: ESC10, UrbanSound8K, and TAU Urban Acoustic Scenes. Our multiband analysis shows that TAE consistently outperforms SF and MW in reconstruction quality, both per frequency band and per class across datasets. Moreover, TAE yields the lowest spike firing rates, indicating superior energy efficiency. For downstream environmental sound classification with a standard SNN, TAE also achieves the best performance among the compared encoders. Overall, this work provides foundational insights and a comparative benchmark to guide the selection of spike encoders for neuromorphic environmental sound processing.
