Hardware-aware training of models with synaptic delays for digital event-driven neuromorphic processors
Alberto Patino-Saucedo, Roy Meijer, Amirreza Yousefzadeh, Manil-Dev Gomony, Federico Corradi, Paul Detteter, Laura Garrido-Regife, Bernabe Linares-Barranco, Manolis Sifalakis
TL;DR
This work tackles the challenge of training and deploying spiking neural networks (SNNs) with configurable per-synapse delays on digital neuromorphic hardware. It introduces a hardware-aware training framework that co-optimizes synaptic weights and delays using spatio-temporal back-propagation with surrogate gradients, along with a pruning strategy to prune delay connections and reallocate delays as needed. A core contribution is the Shared Circular Delay Queue (SCDQ), a memory- and area-efficient delay-acceleration structure for Seneca that shares delay handling across cores and layers, reducing memory overhead to $O(\alpha \cdot I \cdot D)$ and enabling per-axon delay support. The framework is validated on Intel Loihi and Imec Seneca using SHD, showing that hardware-executed models closely match the software mother-model (within about 1% accuracy) and deliver substantial energy and latency benefits, especially for larger networks; the results demonstrate practical deployment of delay-parameterized SNNs on multicore neuromorphic accelerators and highlight the efficiency gains achievable with hardware-aware design.
Abstract
Configurable synaptic delays are a basic feature in many neuromorphic neural network hardware accelerators. However, they have been rarely used in model implementations, despite their promising impact on performance and efficiency in tasks that exhibit complex (temporal) dynamics, as it has been unclear how to optimize them. In this work, we propose a framework to train and deploy, in digital neuromorphic hardware, highly performing spiking neural network models (SNNs) where apart from the synaptic weights, the per-synapse delays are also co-optimized. Leveraging spike-based back-propagation-through-time, the training accounts for both platform constraints, such as synaptic weight precision and the total number of parameters per core, as a function of the network size. In addition, a delay pruning technique is used to reduce memory footprint with a low cost in performance. We evaluate trained models in two neuromorphic digital hardware platforms: Intel Loihi and Imec Seneca. Loihi offers synaptic delay support using the so-called Ring-Buffer hardware structure. Seneca does not provide native hardware support for synaptic delays. A second contribution of this paper is therefore a novel area- and memory-efficient hardware structure for acceleration of synaptic delays, which we have integrated in Seneca. The evaluated benchmark involves several models for solving the SHD (Spiking Heidelberg Digits) classification task, where minimal accuracy degradation during the transition from software to hardware is demonstrated. To our knowledge, this is the first work showcasing how to train and deploy hardware-aware models parameterized with synaptic delays, on multicore neuromorphic hardware accelerators.
