Efficiently Training Time-to-First-Spike Spiking Neural Networks from Scratch
Kaiwei Che, Wei Fang, Zhengyu Ma, Yifan Huang, Peng Xue, Li Yuan, Timothée Masquelier, Yonghong Tian
TL;DR
The paper tackles the challenge of training Time-to-First-Spike (TTFS) SNNs, which offer extreme sparsity and energy efficiency but suffer from unstable training and suboptimal accuracy. It introduces an efficient framework comprising ETTFS-init initialization, weight normalization with a learnable affine transform, temporal weighting decoding, and pooling re-evaluation (favoring average pooling) to stabilize forward propagation and backward gradients. Empirical results on MNIST, Fashion-MNIST, CIFAR10, and DVS Gesture show state-of-the-art TTFS accuracy and reduced training/inference latency, surpassing prior TTFS approaches while remaining competitive with conversion-based methods on some datasets. The work demonstrates that careful alignment of initialization, normalization, and pooling with TTFS dynamics is crucial for scalable TTFS SNNs on neuromorphic hardware.
Abstract
Spiking Neural Networks (SNNs), with their event-driven and biologically inspired operation, are well-suited for energy-efficient neuromorphic hardware. Neural coding, critical to SNNs, determines how information is represented via spikes. Time-to-First-Spike (TTFS) coding, which uses a single spike per neuron, offers extreme sparsity and energy efficiency but suffers from unstable training and low accuracy due to its sparse firing. To address these challenges, we propose a training framework incorporating parameter initialization, training normalization, temporal output decoding, and pooling layer re-evaluation. The proposed parameter initialization and training normalization mitigate signal diminishing and gradient vanishing to stabilize training. The output decoding method aggregates temporal spikes to encourage earlier firing, thereby reducing the latency. The re-evaluation of the pooling layer indicates that average-pooling keeps the single-spike characteristic and that max-pooling should be avoided. Experiments show the framework stabilizes and accelerates training, reduces latency, and achieves state-of-the-art accuracy for TTFS SNNs on MNIST (99.48%), Fashion-MNIST (92.90%), CIFAR10 (90.56%), and DVS Gesture (95.83%).
