DS-ATGO: Dual-Stage Synergistic Learning via Forward Adaptive Threshold and Backward Gradient Optimization for Spiking Neural Networks
Jiaqiang Jiang, Wenfeng Xu, Jing Fan, Rui Yan
TL;DR
The paper addresses training instability in spiking neural networks caused by evolving membrane potential distributions and misaligned threshold and surrogate-gradient (SG) signals. It proposes DS-ATGO, a dual-stage approach with forward adaptive thresholding and backward threshold-driven SG optimization to maintain balanced firing and robust spatio-temporal gradients. Empirical results across CIFAR-10/100, CIFAR-10-DVS, and ImageNet show higher accuracy, more stable firing rates, and greater gradient availability in deeper layers, with favorable energy and latency characteristics. This joint MPD-aware mechanism improves the practicality of SNNs for complex, real-world tasks in neuromorphic computing.
Abstract
Brain-inspired spiking neural networks (SNNs) are recognized as a promising avenue for achieving efficient, low-energy neuromorphic computing. Direct training of SNNs typically relies on surrogate gradient (SG) learning to estimate derivatives of non-differentiable spiking activity. However, during training, the distribution of neuronal membrane potentials varies across timesteps and progressively deviates toward both sides of the firing threshold. When the firing threshold and SG remain fixed, this may lead to imbalanced spike firing and diminished gradient signals, preventing SNNs from performing well. To address these issues, we propose a novel dual-stage synergistic learning algorithm that achieves forward adaptive thresholding and backward dynamic SG. In forward propagation, we adaptively adjust thresholds based on the distribution of membrane potential dynamics (MPD) at each timestep, which enriches neuronal diversity and effectively balances firing rates across timesteps and layers. In backward propagation, drawing from the underlying association between MPD, threshold, and SG, we dynamically optimize SG to enhance gradient estimation through spatio-temporal alignment, effectively mitigating gradient information loss. Experimental results demonstrate that our method achieves significant performance improvements. Moreover, it allows neurons to fire stable proportions of spikes at each timestep and increases the proportion of neurons that obtain gradients in deeper layers.
