To Spike or Not to Spike, that is the Question
Sanaz Mahmoodi Takaghaj, Jack Sampson
TL;DR
This work tackles the challenge of training spiking neural networks (SNNs) where spike generation is non-differentiable and highly sensitive to the neuron threshold. It introduces Rouser, an approach that treats neuron thresholds $Th^{l}_{i}$ as trainable parameters, optimizing them jointly with synaptic weights through spatiotemporal backpropagation with surrogate gradients. Empirical results on NMNIST, DVS128, and SHD show up to $30\%$ faster convergence in epochs and up to $2\%$ improvements in accuracy, along with a reduction in dead neurons and more robust learning dynamics. The method promises more reliable and efficient SNN training across neuromorphic platforms, enabling better real-time, event-driven processing.
Abstract
Neuromorphic computing has recently gained momentum with the emergence of various neuromorphic processors. As the field advances, there is an increasing focus on developing training methods that can effectively leverage the unique properties of spiking neural networks (SNNs). SNNs emulate the temporal dynamics of biological neurons, making them particularly well-suited for real-time, event-driven processing. To fully harness the potential of SNNs across different neuromorphic platforms, effective training methodologies are essential. In SNNs, learning rules are based on neurons' spiking behavior, that is, if and when spikes are generated due to a neuron's membrane potential exceeding that neuron's spiking threshold, and this spike timing encodes vital information. However, the threshold is generally treated as a hyperparameter, and incorrect selection can lead to neurons that do not spike for large portions of the training process, hindering the effective rate of learning. This work focuses on the significance of learning neuron thresholds alongside weights in SNNs. Our results suggest that promoting threshold from a hyperparameter to a trainable parameter effectively addresses the issue of dead neurons during training. This leads to a more robust training algorithm, resulting in improved convergence, increased test accuracy, and a substantial reduction in the number of training epochs required to achieve viable accuracy on spatiotemporal datasets such as NMNIST, DVS128, and Spiking Heidelberg Digits (SHD), with up to 30% training speed-up and up to 2% higher accuracy on these datasets.
