Spiking-LEAF: A Learnable Auditory front-end for Spiking Neural Networks

Zeyang Song; Jibin Wu; Malu Zhang; Mike Zheng Shou; Haizhou Li

Spiking-LEAF: A Learnable Auditory front-end for Spiking Neural Networks

Zeyang Song, Jibin Wu, Malu Zhang, Mike Zheng Shou, Haizhou Li

TL;DR

This work tackles the limited speech performance of brain-inspired SNNs caused by suboptimal auditory front-ends. It proposes Spiking-LEAF, a fully learnable front-end that jointly optimizes a Gabor-filter-based acoustic feature extractor and an IHC-inspired two-compartment spiking neuron (IHC-LIF) enriched with lateral feedback. The system incorporates a spike-rate regularization loss, $L = L_{cls} + \lambda L_{SR}$ with $L_{SR} = \mathrm{ReLU}(R - SR)$, to promote sparse encoding and efficiency. On keyword spotting and speaker identification benchmarks, Spiking-LEAF achieves state-of-the-art accuracy, robustness to noise, and encoding efficiency, outperforming both conventional fbank features and prior spike-based front-ends, indicating strong potential for ultra-low-power neuromorphic speech processing at the edge.

Abstract

Brain-inspired spiking neural networks (SNNs) have demonstrated great potential for temporal signal processing. However, their performance in speech processing remains limited due to the lack of an effective auditory front-end. To address this limitation, we introduce Spiking-LEAF, a learnable auditory front-end meticulously designed for SNN-based speech processing. Spiking-LEAF combines a learnable filter bank with a novel two-compartment spiking neuron model called IHC-LIF. The IHC-LIF neurons draw inspiration from the structure of inner hair cells (IHC) and they leverage segregated dendritic and somatic compartments to effectively capture multi-scale temporal dynamics of speech signals. Additionally, the IHC-LIF neurons incorporate the lateral feedback mechanism along with spike regularization loss to enhance spike encoding efficiency. On keyword spotting and speaker identification tasks, the proposed Spiking-LEAF outperforms both SOTA spiking auditory front-ends and conventional real-valued acoustic features in terms of classification accuracy, noise robustness, and encoding efficiency.

Spiking-LEAF: A Learnable Auditory front-end for Spiking Neural Networks

TL;DR

with

, to promote sparse encoding and efficiency. On keyword spotting and speaker identification benchmarks, Spiking-LEAF achieves state-of-the-art accuracy, robustness to noise, and encoding efficiency, outperforming both conventional fbank features and prior spike-based front-ends, indicating strong potential for ultra-low-power neuromorphic speech processing at the edge.

Abstract

Paper Structure (9 sections, 5 equations, 4 figures, 2 tables)

This paper contains 9 sections, 5 equations, 4 figures, 2 tables.

Introduction
Methods
Parameterized acoustic feature extraction
Two-compartment spiking neuron model
IHC-LIF neurons with lateral feedback
Experimental Results
Superior feature representation
Ablation studies
Conclusion

Figures (4)

Figure 1: The overall architecture of the proposed SNN-based speech processing framework.
Figure 2: Computational graphs of LIF and IHC-LIF neurons.
Figure 3: Test accuracy on the KWS task with varying SNRs.
Figure 4: This figure illustrates the Fbank feature and spike representation generated by Spiking-LEAF without and with lateral inhibition and spike rate regularization loss.

Spiking-LEAF: A Learnable Auditory front-end for Spiking Neural Networks

TL;DR

Abstract

Spiking-LEAF: A Learnable Auditory front-end for Spiking Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (4)