Table of Contents
Fetching ...

Surrogate Gradient Learning in Spiking Neural Networks

Emre O. Neftci, Hesham Mostafa, Friedemann Zenke

TL;DR

<3-5 sentence high-level summary> Surrogate-gradient learning provides a practical framework to train spiking neural networks by replacing non-differentiable spiking functions with smooth surrogates, enabling gradient-based training across deep, time-dependent architectures. By mapping SNNs to recurrent networks, the paper surveys smoothed approaches (soft nonlinearities, probabilistic models, rate coding, single-spike timing) and surrogate derivatives, and discusses locality-aware variants that suit neuromorphic hardware. It covers a spectrum of learning strategies—from full backpropagation through time to local and forward methods—along with applications like random feedback alignment and local-error or spike-time based learning. The work highlights the potential for end-to-end, energy-efficient neuromorphic computing and provides a bridge between machine learning, computational neuroscience, and hardware design.

Abstract

Spiking neural networks are nature's versatile solution to fault-tolerant and energy efficient signal processing. To translate these benefits into hardware, a growing number of neuromorphic spiking neural network processors attempt to emulate biological neural networks. These developments have created an imminent need for methods and tools to enable such systems to solve real-world signal processing problems. Like conventional neural networks, spiking neural networks can be trained on real, domain specific data. However, their training requires overcoming a number of challenges linked to their binary and dynamical nature. This article elucidates step-by-step the problems typically encountered when training spiking neural networks, and guides the reader through the key concepts of synaptic plasticity and data-driven learning in the spiking setting. To that end, it gives an overview of existing approaches and provides an introduction to surrogate gradient methods, specifically, as a particularly flexible and efficient method to overcome the aforementioned challenges.

Surrogate Gradient Learning in Spiking Neural Networks

TL;DR

<3-5 sentence high-level summary> Surrogate-gradient learning provides a practical framework to train spiking neural networks by replacing non-differentiable spiking functions with smooth surrogates, enabling gradient-based training across deep, time-dependent architectures. By mapping SNNs to recurrent networks, the paper surveys smoothed approaches (soft nonlinearities, probabilistic models, rate coding, single-spike timing) and surrogate derivatives, and discusses locality-aware variants that suit neuromorphic hardware. It covers a spectrum of learning strategies—from full backpropagation through time to local and forward methods—along with applications like random feedback alignment and local-error or spike-time based learning. The work highlights the potential for end-to-end, energy-efficient neuromorphic computing and provides a bridge between machine learning, computational neuroscience, and hardware design.

Abstract

Spiking neural networks are nature's versatile solution to fault-tolerant and energy efficient signal processing. To translate these benefits into hardware, a growing number of neuromorphic spiking neural network processors attempt to emulate biological neural networks. These developments have created an imminent need for methods and tools to enable such systems to solve real-world signal processing problems. Like conventional neural networks, spiking neural networks can be trained on real, domain specific data. However, their training requires overcoming a number of challenges linked to their binary and dynamical nature. This article elucidates step-by-step the problems typically encountered when training spiking neural networks, and guides the reader through the key concepts of synaptic plasticity and data-driven learning in the spiking setting. To that end, it gives an overview of existing approaches and provides an introduction to surrogate gradient methods, specifically, as a particularly flexible and efficient method to overcome the aforementioned challenges.

Paper Structure

This paper contains 20 sections, 15 equations, 6 figures.

Figures (6)

  • Figure 1: Example of for a classifier. (a) Value of the loss function (gray) of an classifier along an interpolation path over the hidden layer parameters $\mathbf{W}^{(1)}$. Specifically, we linearly interpolated between the random initial and final (post-optimization) weight matrices of the hidden layer inputs $\mathbf{W}^{(1)}$ (network details: 2 input, 2 hidden, and 2 output units trained on a binary classification task). Note that the loss function (gray) displays characteristic plateaus with zero gradient which are detrimental for numerical optimization. (b) Norm of hidden layer (surrogate) gradients in arbitrary units along the interpolation path. To perform numerical optimization in this network we constructed a (violet) which, in contrast to the true gradient (gray), is non-zero. Note that we obtained the "true gradient" via the finite differences method which in itself is an approximation. Importantly, the approximates the true gradient, but retains favorable properties for optimization, i.e. continuity and finiteness. The can be thought of as the gradient of a virtual surrogate loss function (violet curve in (a); obtained by numerical integration of the and scaled to match loss at initial and final point). This surrogate loss remains virtual because it is generally not computed explicitly. In practice, suitable are obtained directly from the gradients of the original network through sensible approximations. This is a key difference with respect to some other approaches huh_gradient_2018 in which the entire network is replaced explicitly by a surrogate network on which gradient descent can be performed using its true gradients.
  • Figure 2: Deep Continuous Local Learning (DCLL) with spikes Kaiser_etal18_synaplas, applied to the event-based DVSGestures dataset. The feed-forward weights (green) of a three layer convolutional are trained with using local errors generated using fixed random projections to a local classifier. Learning in DCLL scales linearly with the number of neurons thanks to local rate-based cost functions formed by spike-based basis functions. The circular arrows indicate recurrence due to the statefulness of the LIF dynamics (no recurrent synaptic connections were used here) and are not trained. This outperforms BPTT methods shrestha_slayer:_2018, requiring fewer training iterations Kaiser_etal18_synaplas compared to other approaches.
  • Figure 3: Temporal XOR problem. (a) An SNN with one hidden layer. Each input neuron emits one spike which can either be late or early resulting in four possible input patterns that should be classified into two classes. (b) For the four input spike patterns (one per row), the right plots show the membrane potentials of the two output neurons, while the left plots show the membrane potentials of the four hidden neurons. Arrows at the top of the plot indicate output spikes from the layer, while arrows at the bottom indicate input spikes. The output spikes of the hidden layer are the input spikes of the output layer. The classification result is encoded in the identity of the output neuron that spikes first.
  • Figure :
  • Figure : "Unrolled" RNN
  • ...and 1 more figures