Table of Contents
Fetching ...

Exploring neural oscillations during speech perception via surrogate gradient spiking neural networks

Alexandre Bittar, Philip N. Garner

TL;DR

This work investigates how a physiologically inspired surrogate-gradient spiking neural network can learn to process speech and, crucially, exhibit neural oscillations akin to those observed in the human auditory system. By training an end-to-end SNN within a hybrid ANN-SNN architecture, the authors demonstrate cross-frequency coupling (PAC) between delta/theta bands and gamma activity across layers, which correlates with improved speech recognition performance. The study further shows that spike-frequency adaptation, layer-wise recurrence, and Dale's law modulate these oscillations, while background noise fails to elicit significant PAC, highlighting a functional role for oscillations in auditory processing. The results have implications for neuromorphic computing by connecting oscillatory dynamics to efficient speech information processing and suggesting design principles for energy-efficient on-device speech systems.

Abstract

Understanding cognitive processes in the brain demands sophisticated models capable of replicating neural dynamics at large scales. We present a physiologically inspired speech recognition architecture, compatible and scalable with deep learning frameworks, and demonstrate that end-to-end gradient descent training leads to the emergence of neural oscillations in the central spiking neural network. Significant cross-frequency couplings, indicative of these oscillations, are measured within and across network layers during speech processing, whereas no such interactions are observed when handling background noise inputs. Furthermore, our findings highlight the crucial inhibitory role of feedback mechanisms, such as spike frequency adaptation and recurrent connections, in regulating and synchronising neural activity to improve recognition performance. Overall, on top of developing our understanding of synchronisation phenomena notably observed in the human auditory pathway, our architecture exhibits dynamic and efficient information processing, with relevance to neuromorphic technology.

Exploring neural oscillations during speech perception via surrogate gradient spiking neural networks

TL;DR

This work investigates how a physiologically inspired surrogate-gradient spiking neural network can learn to process speech and, crucially, exhibit neural oscillations akin to those observed in the human auditory system. By training an end-to-end SNN within a hybrid ANN-SNN architecture, the authors demonstrate cross-frequency coupling (PAC) between delta/theta bands and gamma activity across layers, which correlates with improved speech recognition performance. The study further shows that spike-frequency adaptation, layer-wise recurrence, and Dale's law modulate these oscillations, while background noise fails to elicit significant PAC, highlighting a functional role for oscillations in auditory processing. The results have implications for neuromorphic computing by connecting oscillatory dynamics to efficient speech information processing and suggesting design principles for energy-efficient on-device speech systems.

Abstract

Understanding cognitive processes in the brain demands sophisticated models capable of replicating neural dynamics at large scales. We present a physiologically inspired speech recognition architecture, compatible and scalable with deep learning frameworks, and demonstrate that end-to-end gradient descent training leads to the emergence of neural oscillations in the central spiking neural network. Significant cross-frequency couplings, indicative of these oscillations, are measured within and across network layers during speech processing, whereas no such interactions are observed when handling background noise inputs. Furthermore, our findings highlight the crucial inhibitory role of feedback mechanisms, such as spike frequency adaptation and recurrent connections, in regulating and synchronising neural activity to improve recognition performance. Overall, on top of developing our understanding of synchronisation phenomena notably observed in the human auditory pathway, our architecture exhibits dynamic and efficient information processing, with relevance to neuromorphic technology.
Paper Structure (46 sections, 12 equations, 6 figures, 4 tables)

This paper contains 46 sections, 12 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Kernel functions of AdLIF neuron model. Membrane potential response to an input pulse at $t=10$ ms (in blue) and to an emitted spike at $t=60$ ms (in orange). The neuron parameters are $\tau_u=5$ ms, $\tau_w=30$ ms, $a=0.5$ and $b=1.5$.
  • Figure 2: End-to-end trainable speech recognition pipeline. Input waveform is converted to a spike train representation to be processed by the central SNN before being transformed into output phoneme probabilities sent to a loss function for training.
  • Figure 3: Spiking activity in response to speech input. (A) Input filterbank features and resulting spike trains produced across layers. For each layer, the neurons are vertically sorted on the y-axis by increasing average firing rate (top to bottom). The model uses a 2 ms time step, 16 CNN channels, 3 layers of size 512, 50% AdLIF neurons, 100% feedforward and 50% recurrent connectivity with Dale's law. (B) Corresponding distribution of single neuron firing rates.
  • Figure 4: Cross-frequency coupling of population aggregated activity. (A) Population signals of auditory nerve fibers (blue) and last layer (orange) filtered in distinct frequency bands. (B) Modulation index and mean vector length metrics as measures of PAC between the theta band of the auditory nerve fibers and the low-gamma band of the last layer.
  • Figure 5: Spiking activity in an untrained network in response to speech input. (A) Input filterbank features and resulting spike trains produced across layers. The model uses a 2 ms time step, 16 CNN channels, 3 layers of size 512, 50% AdLIF neurons, 100% feedforward and 50% recurrent connectivity. (B) Corresponding single neuron firing rate distributions.
  • ...and 1 more figures