Table of Contents
Fetching ...

IKUN: Initialization to Keep snn training and generalization great with sUrrogate-stable variaNce

Da Chang, Deliang Wang, Xiao Yang

TL;DR

Hessian analysis reveals that \textbf{IKUN}-trained models converge to flatter minima, characterized by Hessian eigenvalues near zero on the positive side, promoting better generalization.

Abstract

Weight initialization significantly impacts the convergence and performance of neural networks. While traditional methods like Xavier and Kaiming initialization are widely used, they often fall short for spiking neural networks (SNNs), which have distinct requirements compared to artificial neural networks (ANNs). To address this, we introduce \textbf{IKUN}, a variance-stabilizing initialization method integrated with surrogate gradient functions, specifically designed for SNNs. \textbf{IKUN} stabilizes signal propagation, accelerates convergence, and enhances generalization. Experiments show \textbf{IKUN} improves training efficiency by up to \textbf{50\%}, achieving \textbf{95\%} training accuracy and \textbf{91\%} generalization accuracy. Hessian analysis reveals that \textbf{IKUN}-trained models converge to flatter minima, characterized by Hessian eigenvalues near zero on the positive side, promoting better generalization. The method is open-sourced for further exploration: \href{https://github.com/MaeChd/SurrogateVarStabe}{https://github.com/MaeChd/SurrogateVarStabe}.

IKUN: Initialization to Keep snn training and generalization great with sUrrogate-stable variaNce

TL;DR

Hessian analysis reveals that \textbf{IKUN}-trained models converge to flatter minima, characterized by Hessian eigenvalues near zero on the positive side, promoting better generalization.

Abstract

Weight initialization significantly impacts the convergence and performance of neural networks. While traditional methods like Xavier and Kaiming initialization are widely used, they often fall short for spiking neural networks (SNNs), which have distinct requirements compared to artificial neural networks (ANNs). To address this, we introduce \textbf{IKUN}, a variance-stabilizing initialization method integrated with surrogate gradient functions, specifically designed for SNNs. \textbf{IKUN} stabilizes signal propagation, accelerates convergence, and enhances generalization. Experiments show \textbf{IKUN} improves training efficiency by up to \textbf{50\%}, achieving \textbf{95\%} training accuracy and \textbf{91\%} generalization accuracy. Hessian analysis reveals that \textbf{IKUN}-trained models converge to flatter minima, characterized by Hessian eigenvalues near zero on the positive side, promoting better generalization. The method is open-sourced for further exploration: \href{https://github.com/MaeChd/SurrogateVarStabe}{https://github.com/MaeChd/SurrogateVarStabe}.

Paper Structure

This paper contains 21 sections, 11 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Mechanism of the LIF neuron. This diagram illustrates the working principle of an LIF neuron, modeled as an RC circuit with leakage. When an input spike $I(t)$ causes the membrane potential $V(t)$ to accumulate and reach the threshold $V_{th}$, the neuron emits a spike and resets its potential to the resting value $V_{reset}$. If the threshold is not reached, the membrane potential gradually decays to the resting level, governed by the time constant $\tau = RC$.
  • Figure 2: Temporal logic in SNNs, adapted from wu2018spatio. The figure illustrates the basic principles of temporal logic in SNNs. Multiple neurons are connected via weighted synapses to form a layered network structure. Input signals are encoded before being fed into the network, with common encoding methods including temporal encoding and rate encoding. Temporal encoding represents information through spike intervals, capturing the temporal properties of the original data, while rate encoding uses spiking frequency, where stronger stimuli correspond to higher spike rates. These properties make SNNs efficient for processing time-sensitive data.
  • Figure 3: Activation function curve of $\sigma(x, \alpha) = \frac{1}{1 + \exp(-\alpha x)}$. As $\alpha$ increases, the function gradually approaches a step function.
  • Figure 4: Network architecture. The diagram illustrates the two-layer convolutional SNN architecture, where each layer comprises convolution, pooling, and spiking activation functions for efficient extraction and processing of temporal features in input data.
  • Figure 5: (a) and (c) show the changes in training and testing accuracy under the SGD optimizer, while (b) and (d) depict the training and testing loss curves. IKUN initialization achieves faster loss reduction in the early training stages and maintains higher stability, but in certain cases, its performance is slightly worse than other initialization methods.
  • ...and 4 more figures