Table of Contents
Fetching ...

DS-ATGO: Dual-Stage Synergistic Learning via Forward Adaptive Threshold and Backward Gradient Optimization for Spiking Neural Networks

Jiaqiang Jiang, Wenfeng Xu, Jing Fan, Rui Yan

TL;DR

The paper addresses training instability in spiking neural networks caused by evolving membrane potential distributions and misaligned threshold and surrogate-gradient (SG) signals. It proposes DS-ATGO, a dual-stage approach with forward adaptive thresholding and backward threshold-driven SG optimization to maintain balanced firing and robust spatio-temporal gradients. Empirical results across CIFAR-10/100, CIFAR-10-DVS, and ImageNet show higher accuracy, more stable firing rates, and greater gradient availability in deeper layers, with favorable energy and latency characteristics. This joint MPD-aware mechanism improves the practicality of SNNs for complex, real-world tasks in neuromorphic computing.

Abstract

Brain-inspired spiking neural networks (SNNs) are recognized as a promising avenue for achieving efficient, low-energy neuromorphic computing. Direct training of SNNs typically relies on surrogate gradient (SG) learning to estimate derivatives of non-differentiable spiking activity. However, during training, the distribution of neuronal membrane potentials varies across timesteps and progressively deviates toward both sides of the firing threshold. When the firing threshold and SG remain fixed, this may lead to imbalanced spike firing and diminished gradient signals, preventing SNNs from performing well. To address these issues, we propose a novel dual-stage synergistic learning algorithm that achieves forward adaptive thresholding and backward dynamic SG. In forward propagation, we adaptively adjust thresholds based on the distribution of membrane potential dynamics (MPD) at each timestep, which enriches neuronal diversity and effectively balances firing rates across timesteps and layers. In backward propagation, drawing from the underlying association between MPD, threshold, and SG, we dynamically optimize SG to enhance gradient estimation through spatio-temporal alignment, effectively mitigating gradient information loss. Experimental results demonstrate that our method achieves significant performance improvements. Moreover, it allows neurons to fire stable proportions of spikes at each timestep and increases the proportion of neurons that obtain gradients in deeper layers.

DS-ATGO: Dual-Stage Synergistic Learning via Forward Adaptive Threshold and Backward Gradient Optimization for Spiking Neural Networks

TL;DR

The paper addresses training instability in spiking neural networks caused by evolving membrane potential distributions and misaligned threshold and surrogate-gradient (SG) signals. It proposes DS-ATGO, a dual-stage approach with forward adaptive thresholding and backward threshold-driven SG optimization to maintain balanced firing and robust spatio-temporal gradients. Empirical results across CIFAR-10/100, CIFAR-10-DVS, and ImageNet show higher accuracy, more stable firing rates, and greater gradient availability in deeper layers, with favorable energy and latency characteristics. This joint MPD-aware mechanism improves the practicality of SNNs for complex, real-world tasks in neuromorphic computing.

Abstract

Brain-inspired spiking neural networks (SNNs) are recognized as a promising avenue for achieving efficient, low-energy neuromorphic computing. Direct training of SNNs typically relies on surrogate gradient (SG) learning to estimate derivatives of non-differentiable spiking activity. However, during training, the distribution of neuronal membrane potentials varies across timesteps and progressively deviates toward both sides of the firing threshold. When the firing threshold and SG remain fixed, this may lead to imbalanced spike firing and diminished gradient signals, preventing SNNs from performing well. To address these issues, we propose a novel dual-stage synergistic learning algorithm that achieves forward adaptive thresholding and backward dynamic SG. In forward propagation, we adaptively adjust thresholds based on the distribution of membrane potential dynamics (MPD) at each timestep, which enriches neuronal diversity and effectively balances firing rates across timesteps and layers. In backward propagation, drawing from the underlying association between MPD, threshold, and SG, we dynamically optimize SG to enhance gradient estimation through spatio-temporal alignment, effectively mitigating gradient information loss. Experimental results demonstrate that our method achieves significant performance improvements. Moreover, it allows neurons to fire stable proportions of spikes at each timestep and increases the proportion of neurons that obtain gradients in deeper layers.

Paper Structure

This paper contains 28 sections, 2 theorems, 11 equations, 15 figures, 5 tables, 1 algorithm.

Key Result

Theorem 1

For $U(t)$ that satisfies a normal distribution $N(\mu,\sigma^2)$, the probability that a random variable $U_i(t)$ exceeds $\mu+\sigma$ is relatively constant and given by $P(U_i(t) > \mu+\sigma)=1-\varPhi(1)$, where $\varPhi(\cdot)$ denotes the cumulative distribution function of standard normal di

Figures (15)

  • Figure 1: The distributions of firing rates at differ variances of membrane potential when $V_{th}=1.0$zheng2021going.
  • Figure 2: The distribution of membrane potentials deviating from the threshold in a vanilla SNN with ten timesteps. When almost all the membrane potentials of neurons are beyond $V_{th}$, called saturation. Conversely, called degeneration.
  • Figure 3: The overall framework of DS-ATGO. Internal dynamics of LIF neurons in a layer (gray). In forward propagation, the adaptive threshold (AT) mechanism promotes neurons to generate stable firing rates under different MPD distributions (green). In backward propagation, the threshold-driven SG optimization (TGO) method dynamically scales SG to respond to evolving MPD (yellow).
  • Figure 4: The structural similarity between the gray image and encoded images at different thresholds.
  • Figure 5: (a) The proportion of membrane potentials that fall into the gradient-available interval with fixed SG ($k=1$) under different distributions when using adaptive threshold. (b) The proportion of gradient-available when using the adaptive threshold and threshold-driven gradient optimization.
  • ...and 10 more figures

Theorems & Definitions (4)

  • Theorem 1
  • proof
  • Theorem 1
  • proof