Table of Contents
Fetching ...

Deep activity propagation via weight initialization in spiking neural networks

Aurora Micheli, Olaf Booij, Jan van Gemert, Nergis Tömen

TL;DR

The paper tackles the challenge of training deep spiking neural networks by deriving an SNN-aware weight initialization that preserves activity across layers. By performing a variance-flow analysis that accounts for the spike-threshold activation, the authors obtain a closed-form weight variance $Var[w_l] = \frac{1}{n_l P(u_{l-1} > \theta)}$ that keeps $Var[u_l]$ constant, enabling deep activity propagation. Empirical validation across up to 100 layers and multiple time steps shows that this initialization maintains spike propagation and outperforms standard ANN-based initializations, yielding faster convergence and higher accuracy on MNIST-family datasets and CIFAR-10, with robustness to network width and neuron hyperparameters. The approach is presented as dataset-agnostic and architecture-agnostic, offering practical benefits for deploying deep SNNs on real tasks, though it currently abstracts temporal leakage effects and could be extended to explicit temporal dynamics. Overall, the work provides a principled, theory-grounded initialization that significantly improves the trainability and efficiency of deep SNNs.

Abstract

Spiking Neural Networks (SNNs) and neuromorphic computing offer bio-inspired advantages such as sparsity and ultra-low power consumption, providing a promising alternative to conventional networks. However, training deep SNNs from scratch remains a challenge, as SNNs process and transmit information by quantizing the real-valued membrane potentials into binary spikes. This can lead to information loss and vanishing spikes in deeper layers, impeding effective training. While weight initialization is known to be critical for training deep neural networks, what constitutes an effective initial state for a deep SNN is not well-understood. Existing weight initialization methods designed for conventional networks (ANNs) are often applied to SNNs without accounting for their distinct computational properties. In this work we derive an optimal weight initialization method specifically tailored for SNNs, taking into account the quantization operation. We show theoretically that, unlike standard approaches, this method enables the propagation of activity in deep SNNs without loss of spikes. We demonstrate this behavior in numerical simulations of SNNs with up to 100 layers across multiple time steps. We present an in-depth analysis of the numerical conditions, regarding layer width and neuron hyperparameters, which are necessary to accurately apply our theoretical findings. Furthermore, our experiments on MNIST demonstrate higher accuracy and faster convergence when using the proposed weight initialization scheme. Finally, we show that the newly introduced weight initialization is robust against variations in several network and neuron hyperparameters.

Deep activity propagation via weight initialization in spiking neural networks

TL;DR

The paper tackles the challenge of training deep spiking neural networks by deriving an SNN-aware weight initialization that preserves activity across layers. By performing a variance-flow analysis that accounts for the spike-threshold activation, the authors obtain a closed-form weight variance that keeps constant, enabling deep activity propagation. Empirical validation across up to 100 layers and multiple time steps shows that this initialization maintains spike propagation and outperforms standard ANN-based initializations, yielding faster convergence and higher accuracy on MNIST-family datasets and CIFAR-10, with robustness to network width and neuron hyperparameters. The approach is presented as dataset-agnostic and architecture-agnostic, offering practical benefits for deploying deep SNNs on real tasks, though it currently abstracts temporal leakage effects and could be extended to explicit temporal dynamics. Overall, the work provides a principled, theory-grounded initialization that significantly improves the trainability and efficiency of deep SNNs.

Abstract

Spiking Neural Networks (SNNs) and neuromorphic computing offer bio-inspired advantages such as sparsity and ultra-low power consumption, providing a promising alternative to conventional networks. However, training deep SNNs from scratch remains a challenge, as SNNs process and transmit information by quantizing the real-valued membrane potentials into binary spikes. This can lead to information loss and vanishing spikes in deeper layers, impeding effective training. While weight initialization is known to be critical for training deep neural networks, what constitutes an effective initial state for a deep SNN is not well-understood. Existing weight initialization methods designed for conventional networks (ANNs) are often applied to SNNs without accounting for their distinct computational properties. In this work we derive an optimal weight initialization method specifically tailored for SNNs, taking into account the quantization operation. We show theoretically that, unlike standard approaches, this method enables the propagation of activity in deep SNNs without loss of spikes. We demonstrate this behavior in numerical simulations of SNNs with up to 100 layers across multiple time steps. We present an in-depth analysis of the numerical conditions, regarding layer width and neuron hyperparameters, which are necessary to accurately apply our theoretical findings. Furthermore, our experiments on MNIST demonstrate higher accuracy and faster convergence when using the proposed weight initialization scheme. Finally, we show that the newly introduced weight initialization is robust against variations in several network and neuron hyperparameters.
Paper Structure (10 sections, 12 equations, 10 figures)

This paper contains 10 sections, 12 equations, 10 figures.

Figures (10)

  • Figure 1: Comparison of standard activation functions for ANNs (top) and SNNs (bottom): $\theta$ is the neuron firing threshold. When applied to the pre-activation distribution $u_{l-1}$ (left) the SNN thresholding mechanism (middle) generates binarized activations $x_l$ (right). The dark shaded areas of $u_{l-1}$ correspond to the fraction of neurons which will be activated and provide non-zero input to the next layer. With identical input distributions, this fraction is considerably lower for SNNs. This highlights why weight initializations optimized for ReLU will lead to vanishing activity in deep SNNs.
  • Figure 2: Propagation of $\bm{\text{Var}[u_l]}$ across network layers for (left) our initialization scheme and (right) Kaiming for six firing threshold values($\bm{\theta}$): for all $\theta$, our proposed initialization method enables information propagation across all 100 layers. In contrast, Kaiming initialization leads to information dissipation across layers, particularly evident with threshold values close to the standard ${\theta=1}$. Each simulation was repeated 20 times, and the shaded areas represent the standard deviation over these runs.
  • Figure 3: Propagation of (left) $\bm{\text{Var}[u_l]}$ and (right) total number of spikes across network layers for our method and baseline approaches from the literature: comparing our method (blue line) with different initialization schemes from the literature for both ANNs and SNNs for $\theta=1$. Our proposed initialization method maintains a constant variance $\text{Var}[u_l]$, therefore enabling the propagation of spikes across all 100 layers. In contrast, other methods struggle to conserve activity. Each simulation was repeated 20 times, and the shaded areas represent the standard deviation over these runs.
  • Figure 4: Impact of the finite-size effect on the propagation of $\bm{\text{Var}[u_l]}$ across layers: as the number of neurons decreases and spiking threshold $\theta$ increases, the discrepancy between empirical and theoretical results becomes more pronounced. (Left): $n = 100$. The network can't conserve activity over depth for $\theta>0.85$. (Right): $n=600$. By increasing the number of neurons the finite-size effect becomes less pronounced even for higher values of $\theta$. Each simulation was repeated 20 times, and the shaded areas represent the standard deviation over these runs.
  • Figure 5: Propagation of $\text{Var}[\bm{u}_{l}^{t}]$ (top row) and number of spikes (bottom row) across layers and time steps for our initialization method (left) and Kaiming (right): our proposed weight initialization preserves activity and propagates spikes through 100 layers and 20 time steps. In contrast, with Kaiming initialization neuronal activity dies out after a few layers.
  • ...and 5 more figures