Table of Contents
Fetching ...

DelRec: learning delays in recurrent spiking neural networks

Alexandre Queant, Ulysse Rançon, Benoit R Cottereau, Timothée Masquelier

TL;DR

DelRec tackles the challenge of training delays in recurrent spiking neural networks by introducing a differentiable, surrogate-gradient-based method to learn axonal delays without a predefined maximum, using real-valued delays during training and a triangle-interpolation scheduling mechanism. The approach leverages a future-oriented spike scheduling buffer and a gradually shrinking interpolation width to converge to discrete delays, enabling end-to-end backpropagation in RSNNs across popular neuron models such as LIF. Empirically, DelRec achieves state-of-the-art results on temporal datasets like SSC and PS-MNIST with simple neurons and modest parameter budgets, and provides a thorough functional study on SHD showing the advantages and tradeoffs of recurrent versus feedforward delays. The work underscores the importance of learnable recurrent delays for temporal processing in SNNs and offers a reproducible, hardware-friendly framework for neuromorphic deployment, with code available publicly.

Abstract

Spiking neural networks (SNNs) are a bio-inspired alternative to conventional real-valued deep learning models, with the potential for substantially higher energy efficiency. Interest in SNNs has recently exploded due to a major breakthrough: surrogate gradient learning (SGL), which allows training SNNs with backpropagation, strongly outperforming other approaches. In SNNs, each synapse is characterized not only by a weight but also by a transmission delay. While theoretical works have long suggested that trainable delays significantly enhance expressivity, practical methods for learning them have only recently emerged. Here, we introduce ``DelRec'', the first SGL-based method to train axonal or synaptic delays in recurrent spiking layers, compatible with any spiking neuron model. DelRec leverages a differentiable interpolation technique to handle non-integer delays with well-defined gradients at training time. We show that SNNs with trainable recurrent delays outperform feedforward ones, leading to new state-of-the-art (SOTA) on two challenging temporal datasets (Spiking Speech Command, an audio dataset, and Permuted Sequential MNIST, a vision one), and match the SOTA on the now saturated Spiking Heidelberg Digit dataset using only vanilla Leaky-Integrate-and-Fire neurons with stateless (instantaneous) synapses. Our results demonstrate that recurrent delays are critical for temporal processing in SNNs and can be effectively optimized with DelRec, paving the way for efficient deployment on neuromorphic hardware with programmable delays. Our code is available at https://github.com/alexmaxad/DelRec.

DelRec: learning delays in recurrent spiking neural networks

TL;DR

DelRec tackles the challenge of training delays in recurrent spiking neural networks by introducing a differentiable, surrogate-gradient-based method to learn axonal delays without a predefined maximum, using real-valued delays during training and a triangle-interpolation scheduling mechanism. The approach leverages a future-oriented spike scheduling buffer and a gradually shrinking interpolation width to converge to discrete delays, enabling end-to-end backpropagation in RSNNs across popular neuron models such as LIF. Empirically, DelRec achieves state-of-the-art results on temporal datasets like SSC and PS-MNIST with simple neurons and modest parameter budgets, and provides a thorough functional study on SHD showing the advantages and tradeoffs of recurrent versus feedforward delays. The work underscores the importance of learnable recurrent delays for temporal processing in SNNs and offers a reproducible, hardware-friendly framework for neuromorphic deployment, with code available publicly.

Abstract

Spiking neural networks (SNNs) are a bio-inspired alternative to conventional real-valued deep learning models, with the potential for substantially higher energy efficiency. Interest in SNNs has recently exploded due to a major breakthrough: surrogate gradient learning (SGL), which allows training SNNs with backpropagation, strongly outperforming other approaches. In SNNs, each synapse is characterized not only by a weight but also by a transmission delay. While theoretical works have long suggested that trainable delays significantly enhance expressivity, practical methods for learning them have only recently emerged. Here, we introduce ``DelRec'', the first SGL-based method to train axonal or synaptic delays in recurrent spiking layers, compatible with any spiking neuron model. DelRec leverages a differentiable interpolation technique to handle non-integer delays with well-defined gradients at training time. We show that SNNs with trainable recurrent delays outperform feedforward ones, leading to new state-of-the-art (SOTA) on two challenging temporal datasets (Spiking Speech Command, an audio dataset, and Permuted Sequential MNIST, a vision one), and match the SOTA on the now saturated Spiking Heidelberg Digit dataset using only vanilla Leaky-Integrate-and-Fire neurons with stateless (instantaneous) synapses. Our results demonstrate that recurrent delays are critical for temporal processing in SNNs and can be effectively optimized with DelRec, paving the way for efficient deployment on neuromorphic hardware with programmable delays. Our code is available at https://github.com/alexmaxad/DelRec.

Paper Structure

This paper contains 21 sections, 39 equations, 3 figures, 6 tables, 1 algorithm.

Figures (3)

  • Figure 1: A: The optimization of a single delay in a recurrent connection can transform two recurrently connected neurons in a pattern generator. Two different behaviors of two neurons with the same inputs. Each neuron is recurrently connected to itself and to the other neuron, with a weight equal to 1. The recurrent connections each have a delay, indicated by the circled number on the connection. The neurons spike if they receive inputs strictly superior to $1$ spike. At time $t-1$, the neurons do not receive any input. The blue neuron receives an input spike at times $t$ and $t+3$, while the pink neuron only receives two spikes at time $t+1$. Top: The inputs trigger the firing of one spike per neuron, working as a coincidence detector for spikes reaching the two neurons in a short time interval. Bottom: When the delay of the pink neuron’s recurrent connection (blue arrow) is increased from $1$ to $3$ time steps, the same input triggers a regular and sustained firing pattern. B: Delays in recurrent connections reduce the risks of exploding or vanishing gradients by bridging distant time steps (see \ref{['mitigate explo']} for more details). Computational graphs of a vanilla RSNN with a intrinsic delay of $1$ time step in all recurrent connections (Top), and of a RSNN with different and longer delays in the recurrent connections (Bottom). Variables $H, S$ and $V$ are defined in Eq. \ref{['eq neuronal charge']}- \ref{['eq neuronal reset']}.
  • Figure 2: A: At time step $t$, the neurons in the studied layer receive weighted spikes from the previous layer, which are then summed with the inputs scheduled for time step $t$ in the scheduling matrix. The subsequent evolution of the internal state of the neuron (membrane potential) may produce output spikes. B: Each neuron of the layer receives a weighted sum of the output spikes, and schedules it on a spread of future dates determined by its delay parameter $d$ and the spread of the epoch $\sigma$. Spread values are represented by the purple gradients. C: We modify the spread function at each epoch by reducing its $\sigma$. At the beginning of training, the scheduled values are widely spread around the true delay $d$. When the training ends, $\sigma$ is close to $0$ and the spread function only performs linear interpolation between the closest integers from the floating delay. Then we manually round the delay to the closest integer in evaluation mode.
  • Figure 3: A: The generic architecture of networks used in this part. We use $2$ hidden layers, with the inputs simply linearly mapped to the first layer, and the same for the readout. The linear mapping between the first and the second hidden layers can incorporate feedforward delays, and the neurons in the second layer can have recurrent connections, possibly with delays. B: Histogram of model accuracy on the SHD, with standard error mean (sem). The values on top of the bars are the number of parameters in the models. C: Model accuracy on the SHD as a function of the number of parameters in the network (top), and as a function of the mean number of spikes per neuron per time-step (bottom). The shaded areas represent the standard error mean (sem). For more details, see \ref{['ablation details']}.