DelGrad: Exact event-based gradients for training delays and weights on spiking neuromorphic hardware

Julian Göltz; Jimmy Weber; Laura Kriener; Sebastian Billaudelle; Peter Lake; Johannes Schemmel; Melika Payvand; Mihai A. Petrovici

DelGrad: Exact event-based gradients for training delays and weights on spiking neuromorphic hardware

Julian Göltz, Jimmy Weber, Laura Kriener, Sebastian Billaudelle, Peter Lake, Johannes Schemmel, Melika Payvand, Mihai A. Petrovici

TL;DR

DelGrad presents an exact, event-based gradient framework for jointly training transmission delays and synaptic weights in spiking neural networks, leveraging spike times exclusively and avoiding membrane-potential tracking. It derives analytic spike-time gradients, extends to multi-layer architectures, and integrates delay mechanisms (axonal, dendritic, synaptic) into the learning process, enabling end-to-end optimization with minimal hardware overhead. Empirically, DelGrad improves accuracy and parameter efficiency on a Yin–Yang classification task and demonstrates memory- and energy-friendly chip-in-the-loop training on BrainScaleS-2, with hardware-aware simulations capturing noise-induced gaps. This work enables precise temporal learning for neuromorphic hardware, reducing parameter counts while enhancing robustness to hardware variability and paving the way for delay-aware end-to-end SNN training.

Abstract

Spiking neural networks (SNNs) inherently rely on the timing of signals for representing and processing information. Incorporating trainable transmission delays, alongside synaptic weights, is crucial for shaping these temporal dynamics. While recent methods have shown the benefits of training delays and weights in terms of accuracy and memory efficiency, they rely on discrete time, approximate gradients, and full access to internal variables like membrane potentials. This limits their precision, efficiency, and suitability for neuromorphic hardware due to increased memory requirements and I/O bandwidth demands. To address these challenges, we propose DelGrad, an analytical, event-based method to compute exact loss gradients for both synaptic weights and delays. The inclusion of delays in the training process emerges naturally within our proposed formalism, enriching the model's search space with a temporal dimension. Moreover, DelGrad, grounded purely in spike timing, eliminates the need to track additional variables such as membrane potentials. To showcase this key advantage, we demonstrate the functionality and benefits of DelGrad on the BrainScaleS-2 neuromorphic platform, by training SNNs in a chip-in-the-loop fashion. For the first time, we experimentally demonstrate the memory efficiency and accuracy benefits of adding delays to SNNs on noisy mixed-signal hardware. Additionally, these experiments also reveal the potential of delays for stabilizing networks against noise. DelGrad opens a new way for training SNNs with delays on neuromorphic hardware, which results in fewer required parameters, higher accuracy and ease of hardware training.

DelGrad: Exact event-based gradients for training delays and weights on spiking neuromorphic hardware

TL;DR

Abstract

Paper Structure (34 sections, 43 equations, 14 figures, 6 tables)

This paper contains 34 sections, 43 equations, 14 figures, 6 tables.

Introduction
Training delays in with DelGrad
Spike time gradients
Extension to a multi-layer network
Delay implementation
Simulation results
Setup
Results
Hardware results
Setup
Results
Discussion
Additional simulation results
Deeper networks
Ablation studies
...and 19 more sections

Figures (14)

Figure 1: Information flow in a snn.a) Network architecture of a feed-forward with a spiking input layer at the bottom, a hidden layer in the middle and the output layer on top. While the methods described in this manuscript are applicable to many different network architectures, the structure depicted in a), with variable size of the hidden layer, is used in the following. b) Zoom-in on the information processing in a single lif neuron in the hidden layer. Incoming spikes (blue, bottom) are integrated by the neuron's membrane $u_{m}$ and generate , which accumulate additively. Once the membrane potential passes a threshold (gray dashed line), an output spike (orange, top) is generated and passed on to the neurons in the next layer. The amplitudes are modulated by the respective synaptic weights $w$ (vertical red arrow); these are the parameters that are conventionally adapted during learning. Learnable transmission delays $d$ (horizontal red arrow) shift in time, providing additional temporal processing power to the neuron. c) Zoom-out to a raster plot of the full spiking activity in the network. The information passed between the layers is encoded in the timing of the spikes. As sketched in the raster plot, in the experiments in this manuscript we employ ttfs coding, i.e., each neuron spikes only once, however our method also generalizes to multi-spike scenarios (\ref{['sec:si_math_multi']}) if required by the task.
Figure 2: Computational graph of a multi-layer with spike-time information encoding and adjustable delay and weight parameters.a) Graph for a multi-layer network with spike times $\mathbf{t}^0$ injected into the bottom ($1^\text{st}$) layer. In the forward pass (black arrows), each layer $l$ takes spike times as inputs and returns spike times as outputs that go into the next layer. The spike times of the topmost layer are used to compute the loss function $\mathcal{L}$. The backward pass (red dashed arrows) starts at the loss and passes the gradients backwards through the layers. We consider two types of layer: neuron layers and delay layers. b) Neuron layer with parameters $\mathbf{w}^l$ (synaptic weights). These are used together with the input spike times $\mathbf{t}^{l-1}$ to calculate the output spike times $\mathbf{t}^l$ according to the nonlinear relation described in \ref{['eq:equalTimeEquation', 'eq:doubleTimeEquation']}. c) Delay layer with parameters $\mathbf{d}^l$ that are added (linearly) to the input spike times $\mathbf{t}^{l-1}$ to calculate the output spike times $\mathbf{t}^l$ as in \ref{['eq:delayIsAddition']}.
Figure 3: Illustrating different types of delays.a) From bottom to top: axonal delays shift the timing of the neuron's outgoing spikes by $d_\text{axo}$ (orange); synaptic delays shift the timing of spikes by a specific value $d_\text{syn}$ for each pair of pre- and post-synaptic neuron (purple); dendritic delays shift the timing of the incoming spikes into a neuron by $d_\text{den}$ (red). b) Vector and matrix representation of the different types of delays and their dimensionality as a function of the number of pre- and post-synaptic neurons. c) Equivalent effect of the dendritic and axonal delays on the output spike time of a neuron, due to the time-shift invariance of the temporal dynamics of a neuron. d) Schematic illustration of the location of synaptic, dendritic and axonal delay components in a generic neuromorphic crossbar architecture.
Figure 4: Classification task and simulation results.a) The yy task kriener2021yin consists of the classification of dots based on whether they belong to the Yin (red), Yang (blue), or dot (green) regions, as illustrated in \ref{['fig:simulation']}a. The input features are the two-dimensional coordinates $(x, y)$ of the image, along with their mirrored values $(1-x, 1-y)$, totaling four features. These features are encoded into spike times, such that a larger value of $x$ or $y$ coordinate results in a later spike time for $x$ or $y$ and an early spike time for its mirrored version $1-x$ or $1-y$ respectively. For more details on the encoding, see the original publication kriener2021yin. b) Test error as a function of the number of hidden neurons in an , using different delay types. The solid lines and markers show the median of the error, and the shaded areas illustrate the for $10$ seeds. c) Same data as in b) but as a function of the number of trainable parameters in the networks, i.e., counting the distinct weights and, if applicable, delays. d) Impact of axonal delays as a function of the temporal scale of the dataset. The trainable delays cover a range $\lambda$ as indicated by the orange hue. The network performance without delays is shown in blue.
Figure 5: In-the-loop training with on-chip axonal delays on BrainScaleS-2.a) Schematic illustration of the network architecture for on-chip axonal delays; here, we apply this generic approach to the BrainScaleS-2 neuromorphic hardware. Each neuron in the network (black) is paired with a parrot neuron (orange) connected in a one-to-one scheme. The parrot neuron repeats each of its input spikes with a configurable delay. b) Photograph of the BrainScaleS-2 neuromorphic chip (taken from mueller2020bss2ll). c) Median test errors and on the Yin-Yang dataset when training network weights and axonal delays (orange) or only weights (blue). The dash-dotted lines indicate a hardware-aware simulation (cf. \ref{['sec:si_hardware_aware_sim']}) and the dotted lines the hardware emulation results. For comparison, we also show the ideal software simulation results from \ref{['fig:simulation']}b in gray. The shaded areas indicate the over 10 runs with different seeds. The values for networks with 30.0 hidden neurons (highlighted by the dashed box) are shown for a better comparison in panel d. d) Detailed comparison of performances at 30 hidden neurons of an ideal simulation, hardware-aware simulation and emulation on neuromorphic hardware.
...and 9 more figures

DelGrad: Exact event-based gradients for training delays and weights on spiking neuromorphic hardware

TL;DR

Abstract

DelGrad: Exact event-based gradients for training delays and weights on spiking neuromorphic hardware

Authors

TL;DR

Abstract

Table of Contents

Figures (14)