Table of Contents
Fetching ...

Rethinking Spiking Neural Networks from an Ensemble Learning Perspective

Yongqi Ding, Lin Zuo, Mengmeng Jing, Pei He, Hanpu Deng

TL;DR

This work reframes spiking neural networks (SNNs) as ensembles of temporal subnetworks and identifies excessive differences in initial membrane potentials across timesteps as a key source of unstable outputs and degraded performance. It introduces membrane potential smoothing to align initial states and temporally adjacent subnetwork guidance to stabilize outputs, both without changing network architecture. The approach improves learning by facilitating forward information flow and backward gradient propagation, demonstrated across 1D, 2D, and 3D tasks, achieving notable gains such as 83.20% on CIFAR10-DVS with only four timesteps and strong results on SHD and DVS-Gesture. The method shows robustness to hyperparameters and broad applicability, offering a practical path to unlock the potential of energy-efficient SNNs in diverse domains.

Abstract

Spiking neural networks (SNNs) exhibit superior energy efficiency but suffer from limited performance. In this paper, we consider SNNs as ensembles of temporal subnetworks that share architectures and weights, and highlight a crucial issue that affects their performance: excessive differences in initial states (neuronal membrane potentials) across timesteps lead to unstable subnetwork outputs, resulting in degraded performance. To mitigate this, we promote the consistency of the initial membrane potential distribution and output through membrane potential smoothing and temporally adjacent subnetwork guidance, respectively, to improve overall stability and performance. Moreover, membrane potential smoothing facilitates forward propagation of information and backward propagation of gradients, mitigating the notorious temporal gradient vanishing problem. Our method requires only minimal modification of the spiking neurons without adapting the network structure, making our method generalizable and showing consistent performance gains in 1D speech, 2D object, and 3D point cloud recognition tasks. In particular, on the challenging CIFAR10-DVS dataset, we achieved 83.20\% accuracy with only four timesteps. This provides valuable insights into unleashing the potential of SNNs.

Rethinking Spiking Neural Networks from an Ensemble Learning Perspective

TL;DR

This work reframes spiking neural networks (SNNs) as ensembles of temporal subnetworks and identifies excessive differences in initial membrane potentials across timesteps as a key source of unstable outputs and degraded performance. It introduces membrane potential smoothing to align initial states and temporally adjacent subnetwork guidance to stabilize outputs, both without changing network architecture. The approach improves learning by facilitating forward information flow and backward gradient propagation, demonstrated across 1D, 2D, and 3D tasks, achieving notable gains such as 83.20% on CIFAR10-DVS with only four timesteps and strong results on SHD and DVS-Gesture. The method shows robustness to hyperparameters and broad applicability, offering a practical path to unlock the potential of energy-efficient SNNs in diverse domains.

Abstract

Spiking neural networks (SNNs) exhibit superior energy efficiency but suffer from limited performance. In this paper, we consider SNNs as ensembles of temporal subnetworks that share architectures and weights, and highlight a crucial issue that affects their performance: excessive differences in initial states (neuronal membrane potentials) across timesteps lead to unstable subnetwork outputs, resulting in degraded performance. To mitigate this, we promote the consistency of the initial membrane potential distribution and output through membrane potential smoothing and temporally adjacent subnetwork guidance, respectively, to improve overall stability and performance. Moreover, membrane potential smoothing facilitates forward propagation of information and backward propagation of gradients, mitigating the notorious temporal gradient vanishing problem. Our method requires only minimal modification of the spiking neurons without adapting the network structure, making our method generalizable and showing consistent performance gains in 1D speech, 2D object, and 3D point cloud recognition tasks. In particular, on the challenging CIFAR10-DVS dataset, we achieved 83.20\% accuracy with only four timesteps. This provides valuable insights into unleashing the potential of SNNs.

Paper Structure

This paper contains 29 sections, 28 equations, 9 figures, 16 tables, 2 algorithms.

Figures (9)

  • Figure 1: Membrane potential distribution on CIFAR10-DVS, where $\mu$ and $\sigma$ denotes the mean and standard deviation, respectively. Top: The membrane potential distribution of the vanilla SNN varies greatly across timesteps, which affects performance. Bottom: Our method allows for a more stable distribution with smaller differences across timesteps. See Appendix \ref{['addvis']} for more visualizations.
  • Figure 2: Illustration of (a) the vanilla LIF neuron and (b) the membrane potential smoothing. We smooth the membrane potential at timestep $t$ using the layer-shared coefficient $\alpha^l$ and the smoothed membrane potential $\tilde{H}^l_i(t-1)$ at timestep $t-1$ to reduce membrane potential differences and create additional information/gradient propagation pathways.
  • Figure 3: Visualization of the membrane potential distribution before (top) and after (bottom) smoothing. Smoothing reduces distribution differences, especially for $T1 \to T2 \to T3$.
  • Figure 4: Two-dimensional t-SNE visualization on the CIFAR10-DVS dataset. Top: The output of the vanilla SNN varies greatly across timesteps, and the overall output is confusing, making it difficult to distinguish between classes. Bottom: The output of our SNN is more stable across timesteps and more distinguishable across classes, especially for the first two timesteps.
  • Figure 5: The optimization trend of $\alpha$ during training. $\alpha$ with different initial values gradually converge with training iterations, indicating that our method is insensitive to the initial value of $\alpha$.
  • ...and 4 more figures