Hardware-aware training of models with synaptic delays for digital event-driven neuromorphic processors

Alberto Patino-Saucedo; Roy Meijer; Amirreza Yousefzadeh; Manil-Dev Gomony; Federico Corradi; Paul Detteter; Laura Garrido-Regife; Bernabe Linares-Barranco; Manolis Sifalakis

Hardware-aware training of models with synaptic delays for digital event-driven neuromorphic processors

Alberto Patino-Saucedo, Roy Meijer, Amirreza Yousefzadeh, Manil-Dev Gomony, Federico Corradi, Paul Detteter, Laura Garrido-Regife, Bernabe Linares-Barranco, Manolis Sifalakis

TL;DR

This work tackles the challenge of training and deploying spiking neural networks (SNNs) with configurable per-synapse delays on digital neuromorphic hardware. It introduces a hardware-aware training framework that co-optimizes synaptic weights and delays using spatio-temporal back-propagation with surrogate gradients, along with a pruning strategy to prune delay connections and reallocate delays as needed. A core contribution is the Shared Circular Delay Queue (SCDQ), a memory- and area-efficient delay-acceleration structure for Seneca that shares delay handling across cores and layers, reducing memory overhead to $O(\alpha \cdot I \cdot D)$ and enabling per-axon delay support. The framework is validated on Intel Loihi and Imec Seneca using SHD, showing that hardware-executed models closely match the software mother-model (within about 1% accuracy) and deliver substantial energy and latency benefits, especially for larger networks; the results demonstrate practical deployment of delay-parameterized SNNs on multicore neuromorphic accelerators and highlight the efficiency gains achievable with hardware-aware design.

Abstract

Configurable synaptic delays are a basic feature in many neuromorphic neural network hardware accelerators. However, they have been rarely used in model implementations, despite their promising impact on performance and efficiency in tasks that exhibit complex (temporal) dynamics, as it has been unclear how to optimize them. In this work, we propose a framework to train and deploy, in digital neuromorphic hardware, highly performing spiking neural network models (SNNs) where apart from the synaptic weights, the per-synapse delays are also co-optimized. Leveraging spike-based back-propagation-through-time, the training accounts for both platform constraints, such as synaptic weight precision and the total number of parameters per core, as a function of the network size. In addition, a delay pruning technique is used to reduce memory footprint with a low cost in performance. We evaluate trained models in two neuromorphic digital hardware platforms: Intel Loihi and Imec Seneca. Loihi offers synaptic delay support using the so-called Ring-Buffer hardware structure. Seneca does not provide native hardware support for synaptic delays. A second contribution of this paper is therefore a novel area- and memory-efficient hardware structure for acceleration of synaptic delays, which we have integrated in Seneca. The evaluated benchmark involves several models for solving the SHD (Spiking Heidelberg Digits) classification task, where minimal accuracy degradation during the transition from software to hardware is demonstrated. To our knowledge, this is the first work showcasing how to train and deploy hardware-aware models parameterized with synaptic delays, on multicore neuromorphic hardware accelerators.

Hardware-aware training of models with synaptic delays for digital event-driven neuromorphic processors

TL;DR

and enabling per-axon delay support. The framework is validated on Intel Loihi and Imec Seneca using SHD, showing that hardware-executed models closely match the software mother-model (within about 1% accuracy) and deliver substantial energy and latency benefits, especially for larger networks; the results demonstrate practical deployment of delay-parameterized SNNs on multicore neuromorphic accelerators and highlight the efficiency gains achievable with hardware-aware design.

Abstract

Paper Structure (17 sections, 4 equations, 11 figures, 5 tables)

This paper contains 17 sections, 4 equations, 11 figures, 5 tables.

Introduction
Related Work
Methods
Delay Model Description
Training Framework
Hardware Model Deployment and Experimental Setup
Network Models and Dataset - SHD
Model deployment on Seneca and Loihi
Memory Efficient Synaptic Delay Acceleration (Seneca)
Shared Circular Delay Queue Architecture
Zero-Skipping Delay-Forwarding
Results
Fidelity of delay models on neuromorphic hardware
Power, Energy, Latency, Memory measurements
Final accuracy and effect of reduced bit precision
...and 2 more sections

Figures (11)

Figure 1: Weight-delay representation before (a) and after (b) prune delay-synapses.
Figure 2: Delay model training pipeline.
Figure 3: An example of the event flow over three timesteps in a two-layer SNN with synaptic delays. The maximum delay is 2, and the Shared Circular Delay Queue is positioned between the two layers. In timestep $t=0$, neurons $A$ and $B$ spike, and neuron $C$ receives spikes from neurons $A$ and $B$ with a delay value of 0. In timestep $t=1$, neuron $B$ spikes, and neuron $C$ receives a spike from neuron $A$ with a delay value of 0, from neuron $B$ with a delay value of 0, and from neuron $B$ with a delay value of 1. In timestep $t=2$, no neuron spikes, and neuron $C$ receives a spike from neuron $A$ with a delay value of 2, from neuron $B$ with a delay value of 2, and from neuron $B$ with a delay value of 1.
Figure 4: A mapping on the Seneca platform of a three-layer SNN with 700 wide input samples, 48 neurons in both hidden layers, 20 neurons in the output layer, and 60 delays between the hidden layers
Figure 5: Example of $WVU$ for a network where delayed axons $A,2$, $B,0$ and $B,1$ are skipped.
...and 6 more figures

Hardware-aware training of models with synaptic delays for digital event-driven neuromorphic processors

TL;DR

Abstract

Hardware-aware training of models with synaptic delays for digital event-driven neuromorphic processors

Authors

TL;DR

Abstract

Table of Contents

Figures (11)