Spikingformer: A Key Foundation Model for Spiking Neural Networks

Chenlin Zhou; Liutao Yu; Zhaokun Zhou; Han Zhang; Jiaqi Wang; Huihui Zhou; Zhengyu Ma; Yonghong Tian

Spikingformer: A Key Foundation Model for Spiking Neural Networks

Chenlin Zhou, Liutao Yu, Zhaokun Zhou, Han Zhang, Jiaqi Wang, Huihui Zhou, Zhengyu Ma, Yonghong Tian

TL;DR

Spikingformer introduces a spike-driven Transformer backbone by integrating MS Residual with Self-Attention to eliminate non-spike computations that hinder energy efficiency in Spikformer. The approach preserves global modeling capabilities via Spiking Self Attention and a Spiking Tokenizer, achieving strong performance across ImageNet, CIFAR, neuromorphic data, and GLUE with significantly reduced energy consumption. The authors provide theoretical energy analysis, extensive experiments on 13 datasets, and detailed Appendix resources to support deployment on neuromorphic hardware, positioning Spikingformer as a robust foundation model for energy-efficient AI. The work demonstrates that spike-driven residuals can sustain high accuracy while maintaining event-driven computation, advancing practical SNNs for diverse tasks.

Abstract

Spiking neural networks (SNNs) offer a promising energy-efficient alternative to artificial neural networks, due to their event-driven spiking computation. However, some foundation SNN backbones (including Spikformer and SEW ResNet) suffer from non-spike computations (integer-float multiplications) caused by the structure of their residual connections. These non-spike computations increase SNNs' power consumption and make them unsuitable for deployment on mainstream neuromorphic hardware. In this paper, we analyze the spike-driven behavior of the residual connection methods in SNNs. We then present Spikingformer, a novel spiking transformer backbone that merges the MS Residual connection with Self-Attention in a biologically plausible way to address the non-spike computation challenge in Spikformer while maintaining global modeling capabilities. We evaluate Spikingformer across 13 datasets spanning large static images, neuromorphic data, and natural language tasks, and demonstrate the effectiveness and universality of Spikingformer, setting a vital benchmark for spiking neural networks. In addition, with the spike-driven features and global modeling capabilities, Spikingformer is expected to become a more efficient general-purpose SNN backbone towards energy-efficient artificial intelligence. Code: https://github.com/TheBrainLab/Spikingformer

Spikingformer: A Key Foundation Model for Spiking Neural Networks

TL;DR

Abstract

Paper Structure (26 sections, 19 equations, 5 figures, 7 tables)

This paper contains 26 sections, 19 equations, 5 figures, 7 tables.

Introduction
Related Work
Convolution-based Spiking Neural Network
Transformer-based Spiking Neural Network.
Methods
Spiking Neuron Model
Spike-Driven Behavior in SNN Residual Learning
Spikingformer
Theoretical Energy Consumption Calculation
Experiments
ImageNet-1k Classification
CIFAR and Neuromorphic Tasks
Natural Language Understanding
Discussion
Conclusion
...and 11 more sections

Figures (5)

Figure 1: The residual learning in Spikformer and Spikingformer. (a) shows the SEW Residual learning of Spikformer, which contains non-spike computation (integer-float multiplications) in $\operatorname{ConvBN}$ layer. (b) shows the MS Residual connection, which is adopted in Spikingformer. MS Residual could effectively avoid integer-float multiplications, following the spike-driven principle.
Figure 2: The overview of Spikingformer, which consists of a Spiking Tokenizer, several Spiking Transformer Blocks, and a Classification Head. Note that Mutistep LIF is the Leaky Integrate-and-Fire (LIF) neuron model fang2021deepzhou2023spikformer with time steps $T>1$. Same with Spikformer, $T$ is an independent dimension for the spike neuron layer. In other layers, it is merged with the batch size. We use $\operatorname{ConvBN}$ to represent a convolution layer and its subsequent BN layer in this work.
Figure 3: The spike-driven behavior of Spikingformer and Spiformer. \ref{['fig:nonspike']} Histogram of the input data of each block in Spikformer-8-512. The abscissa means non-spike data range with $\{0, 1, 2,..., 16\}$ before $\operatorname{Conv}$ layer in the transformer block of Spikformer. The nonzero ratio indicates the ratio of non-zero input units for each block. \ref{['fig:spike']} Histogram of the input data of each block in Spikingformer-8-512. The abscissa means binary spike data with $\{0, 1\}$ before $\operatorname{Conv}$ layer in the transformer block of Spikingformer. The ordinate means of the neuron numbers of $\{0, 1\}$.
Figure 4: Firing Patterns. (a) Layer-wise firing rates of Spikingformer-8-768 on ImageNet. "ST" denotes the Spiking tokenizer. (b) Firing patterns of 196 $\times$ 96 neurons from layer 7 (red circle on the left) of Spikingformer-8-768 on ImageNet, where the white dots represent firing.
Figure 5: The attention map visualization of PSSA in Spikingformer.

Spikingformer: A Key Foundation Model for Spiking Neural Networks

TL;DR

Abstract

Spikingformer: A Key Foundation Model for Spiking Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (5)