Rethinking Spiking Neural Networks from an Ensemble Learning Perspective
Yongqi Ding, Lin Zuo, Mengmeng Jing, Pei He, Hanpu Deng
TL;DR
This work reframes spiking neural networks (SNNs) as ensembles of temporal subnetworks and identifies excessive differences in initial membrane potentials across timesteps as a key source of unstable outputs and degraded performance. It introduces membrane potential smoothing to align initial states and temporally adjacent subnetwork guidance to stabilize outputs, both without changing network architecture. The approach improves learning by facilitating forward information flow and backward gradient propagation, demonstrated across 1D, 2D, and 3D tasks, achieving notable gains such as 83.20% on CIFAR10-DVS with only four timesteps and strong results on SHD and DVS-Gesture. The method shows robustness to hyperparameters and broad applicability, offering a practical path to unlock the potential of energy-efficient SNNs in diverse domains.
Abstract
Spiking neural networks (SNNs) exhibit superior energy efficiency but suffer from limited performance. In this paper, we consider SNNs as ensembles of temporal subnetworks that share architectures and weights, and highlight a crucial issue that affects their performance: excessive differences in initial states (neuronal membrane potentials) across timesteps lead to unstable subnetwork outputs, resulting in degraded performance. To mitigate this, we promote the consistency of the initial membrane potential distribution and output through membrane potential smoothing and temporally adjacent subnetwork guidance, respectively, to improve overall stability and performance. Moreover, membrane potential smoothing facilitates forward propagation of information and backward propagation of gradients, mitigating the notorious temporal gradient vanishing problem. Our method requires only minimal modification of the spiking neurons without adapting the network structure, making our method generalizable and showing consistent performance gains in 1D speech, 2D object, and 3D point cloud recognition tasks. In particular, on the challenging CIFAR10-DVS dataset, we achieved 83.20\% accuracy with only four timesteps. This provides valuable insights into unleashing the potential of SNNs.
