All in one timestep: Enhancing Sparsity and Energy efficiency in Multi-level Spiking Neural Networks
Andrea Castagnetti, Alain Pegatoquet, Benoît Miramond
TL;DR
This work tackles information loss and energy efficiency in Spiking Neural Networks by introducing multi-level spiking neurons and a Sparse-ResNet architecture. The multi-level neuron increases per-timestep information throughput by enabling $z(t)\in[0,N]$ and yields $N\times T+1$ quantization levels, while barrier neurons with STE mitigate spike avalanche and improve gradient flow in residual paths. An energy model that accounts for memory accesses demonstrates 2–3× energy savings on CIFAR-10/100 at 1 timestep and substantial latency compression on neuromorphic CIFAR-10-DVS, with Spars-ResNet achieving comparable accuracy while reducing network activity by over 20%. Together, these advances enable high-accuracy, low-latency, and energy-efficient SNNs suitable for on-device neuromorphic deployment and real-time event-based processing.
Abstract
Spiking Neural Networks (SNNs) are one of the most promising bio-inspired neural networks models and have drawn increasing attention in recent years. The event-driven communication mechanism of SNNs allows for sparse and theoretically low-power operations on dedicated neuromorphic hardware. However, the binary nature of instantaneous spikes also leads to considerable information loss in SNNs, resulting in accuracy degradation. To address this issue, we propose a multi-level spiking neuron model able to provide both low-quantization error and minimal inference latency while approaching the performance of full precision Artificial Neural Networks (ANNs). Experimental results with popular network architectures and datasets, show that multi-level spiking neurons provide better information compression, allowing therefore a reduction in latency without performance loss. When compared to binary SNNs on image classification scenarios, multi-level SNNs indeed allow reducing by 2 to 3 times the energy consumption depending on the number of quantization intervals. On neuromorphic data, our approach allows us to drastically reduce the inference latency to 1 timestep, which corresponds to a compression factor of 10 compared to previously published results. At the architectural level, we propose a new residual architecture that we call Sparse-ResNet. Through a careful analysis of the spikes propagation in residual connections we highlight a spike avalanche effect, that affects most spiking residual architectures. Using our Sparse-ResNet architecture, we can provide state-of-the-art accuracy results in image classification while reducing by more than 20% the network activity compared to the previous spiking ResNets.
