Table of Contents
Fetching ...

Efficiently Training Time-to-First-Spike Spiking Neural Networks from Scratch

Kaiwei Che, Wei Fang, Zhengyu Ma, Yifan Huang, Peng Xue, Li Yuan, Timothée Masquelier, Yonghong Tian

TL;DR

The paper tackles the challenge of training Time-to-First-Spike (TTFS) SNNs, which offer extreme sparsity and energy efficiency but suffer from unstable training and suboptimal accuracy. It introduces an efficient framework comprising ETTFS-init initialization, weight normalization with a learnable affine transform, temporal weighting decoding, and pooling re-evaluation (favoring average pooling) to stabilize forward propagation and backward gradients. Empirical results on MNIST, Fashion-MNIST, CIFAR10, and DVS Gesture show state-of-the-art TTFS accuracy and reduced training/inference latency, surpassing prior TTFS approaches while remaining competitive with conversion-based methods on some datasets. The work demonstrates that careful alignment of initialization, normalization, and pooling with TTFS dynamics is crucial for scalable TTFS SNNs on neuromorphic hardware.

Abstract

Spiking Neural Networks (SNNs), with their event-driven and biologically inspired operation, are well-suited for energy-efficient neuromorphic hardware. Neural coding, critical to SNNs, determines how information is represented via spikes. Time-to-First-Spike (TTFS) coding, which uses a single spike per neuron, offers extreme sparsity and energy efficiency but suffers from unstable training and low accuracy due to its sparse firing. To address these challenges, we propose a training framework incorporating parameter initialization, training normalization, temporal output decoding, and pooling layer re-evaluation. The proposed parameter initialization and training normalization mitigate signal diminishing and gradient vanishing to stabilize training. The output decoding method aggregates temporal spikes to encourage earlier firing, thereby reducing the latency. The re-evaluation of the pooling layer indicates that average-pooling keeps the single-spike characteristic and that max-pooling should be avoided. Experiments show the framework stabilizes and accelerates training, reduces latency, and achieves state-of-the-art accuracy for TTFS SNNs on MNIST (99.48%), Fashion-MNIST (92.90%), CIFAR10 (90.56%), and DVS Gesture (95.83%).

Efficiently Training Time-to-First-Spike Spiking Neural Networks from Scratch

TL;DR

The paper tackles the challenge of training Time-to-First-Spike (TTFS) SNNs, which offer extreme sparsity and energy efficiency but suffer from unstable training and suboptimal accuracy. It introduces an efficient framework comprising ETTFS-init initialization, weight normalization with a learnable affine transform, temporal weighting decoding, and pooling re-evaluation (favoring average pooling) to stabilize forward propagation and backward gradients. Empirical results on MNIST, Fashion-MNIST, CIFAR10, and DVS Gesture show state-of-the-art TTFS accuracy and reduced training/inference latency, surpassing prior TTFS approaches while remaining competitive with conversion-based methods on some datasets. The work demonstrates that careful alignment of initialization, normalization, and pooling with TTFS dynamics is crucial for scalable TTFS SNNs on neuromorphic hardware.

Abstract

Spiking Neural Networks (SNNs), with their event-driven and biologically inspired operation, are well-suited for energy-efficient neuromorphic hardware. Neural coding, critical to SNNs, determines how information is represented via spikes. Time-to-First-Spike (TTFS) coding, which uses a single spike per neuron, offers extreme sparsity and energy efficiency but suffers from unstable training and low accuracy due to its sparse firing. To address these challenges, we propose a training framework incorporating parameter initialization, training normalization, temporal output decoding, and pooling layer re-evaluation. The proposed parameter initialization and training normalization mitigate signal diminishing and gradient vanishing to stabilize training. The output decoding method aggregates temporal spikes to encourage earlier firing, thereby reducing the latency. The re-evaluation of the pooling layer indicates that average-pooling keeps the single-spike characteristic and that max-pooling should be avoided. Experiments show the framework stabilizes and accelerates training, reduces latency, and achieves state-of-the-art accuracy for TTFS SNNs on MNIST (99.48%), Fashion-MNIST (92.90%), CIFAR10 (90.56%), and DVS Gesture (95.83%).

Paper Structure

This paper contains 31 sections, 30 equations, 10 figures, 7 tables.

Figures (10)

  • Figure 1: An overview of the impacts and performance of the proposed methods. (a) The default Kaiming initialization causes the signal diminishing problem in the TTFS SNNs, where post-synaptic current variance ($\sigma^2$) rapidly decreases across layers ($L$). (b) Proposed ETTFS-init regulates distributions. (c) ETTFS-init accelerates convergence and improves accuracy over Kaiming initialization, further enhanced by weight normalization. (d) Our decoding method reduces average inference time-steps compared to previous TQ-TTFS decoding yang2024tq across four datasets.
  • Figure 2: Physical latency comparison in a $3$-layer SNN ($T=4$): Layer-by-layer propagation (common in prior TTFS SNNs) requires each layer to process full input sequences, causing latency scaling with network depth. Step-by-step propagation (our method) enables immediate spike transmission between layers at each time-step.
  • Figure 3: The mechanism of temporal weighting decoding. The weight $w[t]$ is set as an exponential or linear decay function of $t$. The earlier spike will be decoded to a larger value and predominate the output results. For example, neuron $i$ fires at $t_i$, which is earlier than neuron $j$ firing at $t_j$, and $Y[i] = w[t_i] > Y[j] = w[t_j]$.
  • Figure 4: Example of pooling in TTFS SNNs.
  • Figure 5: Comparison of gradients of weights from TTFS SNNs initialized by the Kaiming initialization (first row) and the proposed ETTFS-init method (second row). We show the histogram of $\frac{\partial \mathcal{L}}{\partial W^{l}}$ from (a) the first layer ($l=0$), and (b) the last layer ($l=4$). The Kaiming initialization results in extremely small gradients in the scale less than $10^{-7}$. While our ETTFS-init method leads to appropriate gradients in the scale larger than $10^{-3}$.
  • ...and 5 more figures