Towards Scalable GPU-Accelerated SNN Training via Temporal Fusion
Yanchen Li, Jiachun Li, Kebin Sun, Luziwei Leng, Ran Cheng
TL;DR
This work tackles the slow training of Spiking Neural Networks (SNNs) on GPUs caused by temporal dynamics. It introduces temporal fusion to decouple and fuse LIF neuron propagation, enabling layer-wise processing across all time steps on a single GPU and extending to multi-GPU pipelines with pipeline parallelism. The authors present a CUDA-based implementation integrated with PyTorch, derive a theoretical speedup model, and demonstrate 5×–40× accelerations across static and event-based benchmarks while preserving accuracy. They further analyze time-step scalability and multi-GPU performance, showing increased benefits as temporal depth grows, with an optimal GPU count near $\sqrt{T_s/T_c}$. The approach promises scalable SNN training on commodity GPUs, supporting larger temporal horizons and bridging SNN research with practical deployment.
Abstract
Drawing on the intricate structures of the brain, Spiking Neural Networks (SNNs) emerge as a transformative development in artificial intelligence, closely emulating the complex dynamics of biological neural networks. While SNNs show promising efficiency on specialized sparse-computational hardware, their practical training often relies on conventional GPUs. This reliance frequently leads to extended computation times when contrasted with traditional Artificial Neural Networks (ANNs), presenting significant hurdles for advancing SNN research. To navigate this challenge, we present a novel temporal fusion method, specifically designed to expedite the propagation dynamics of SNNs on GPU platforms, which serves as an enhancement to the current significant approaches for handling deep learning tasks with SNNs. This method underwent thorough validation through extensive experiments in both authentic training scenarios and idealized conditions, confirming its efficacy and adaptability for single and multi-GPU systems. Benchmarked against various existing SNN libraries/implementations, our method achieved accelerations ranging from $5\times$ to $40\times$ on NVIDIA A100 GPUs. Publicly available experimental codes can be found at https://github.com/EMI-Group/snn-temporal-fusion.
