Table of Contents
Fetching ...

Towards High-performance Spiking Transformers from ANN to SNN Conversion

Zihan Huang, Xinyu Shi, Zecheng Hao, Tong Bu, Jianhao Ding, Zhaofei Yu, Tiejun Huang

TL;DR

This paper tackles the challenge of converting Transformers to Spiking Neural Networks by introducing the Expectation Compensation Module (ECM) to preserve nonlinear behavior during conversion and the Multi-Threshold (MT) neuron with Parallel Parameter normalization to reduce latency and energy. The ECM computes time-step–dependent expected outputs for nonlinear components and matrix products, enabling lossless conversion, while MT neurons distribute spikes across multiple thresholds to curb power and delay. The approach, termed ECMT, achieves state-of-the-art results on ImageNet1k (e.g., 88.60% top-1 with 4 time steps and ~35% of the original energy for EVA) and demonstrates strong energy efficiency and low-latency operation across ViT variants and CIFAR datasets, marking a first high-accuracy ANN-to-SNN Transformer conversion. This work significantly advances energy-efficient neuromorphic implementations of large-scale vision transformers with practical impact for real-time, low-power AI systems, and provides open-source code for reproducibility. The key theoretical contributions include formal proofs (Theorem 1 and Theorem 2) linking SNN outputs to ANN expectations and a matrix-product EC framework, together with an engineered hardware-aware neuron design to enable practical deployment.

Abstract

Spiking neural networks (SNNs) show great potential due to their energy efficiency, fast processing capabilities, and robustness. There are two main approaches to constructing SNNs. Direct training methods require much memory, while conversion methods offer a simpler and more efficient option. However, current conversion methods mainly focus on converting convolutional neural networks (CNNs) to SNNs. Converting Transformers to SNN is challenging because of the presence of non-linear modules. In this paper, we propose an Expectation Compensation Module to preserve the accuracy of the conversion. The core idea is to use information from the previous T time-steps to calculate the expected output at time-step T. We also propose a Multi-Threshold Neuron and the corresponding Parallel Parameter normalization to address the challenge of large time steps needed for high accuracy, aiming to reduce network latency and power consumption. Our experimental results demonstrate that our approach achieves state-of-the-art performance. For example, we achieve a top-1 accuracy of 88.60\% with only a 1\% loss in accuracy using 4 time steps while consuming only 35\% of the original power of the Transformer. To our knowledge, this is the first successful Artificial Neural Network (ANN) to SNN conversion for Spiking Transformers that achieves high accuracy, low latency, and low power consumption on complex datasets. The source codes of the proposed method are available at https://github.com/h-z-h-cell/Transformer-to-SNN-ECMT.

Towards High-performance Spiking Transformers from ANN to SNN Conversion

TL;DR

This paper tackles the challenge of converting Transformers to Spiking Neural Networks by introducing the Expectation Compensation Module (ECM) to preserve nonlinear behavior during conversion and the Multi-Threshold (MT) neuron with Parallel Parameter normalization to reduce latency and energy. The ECM computes time-step–dependent expected outputs for nonlinear components and matrix products, enabling lossless conversion, while MT neurons distribute spikes across multiple thresholds to curb power and delay. The approach, termed ECMT, achieves state-of-the-art results on ImageNet1k (e.g., 88.60% top-1 with 4 time steps and ~35% of the original energy for EVA) and demonstrates strong energy efficiency and low-latency operation across ViT variants and CIFAR datasets, marking a first high-accuracy ANN-to-SNN Transformer conversion. This work significantly advances energy-efficient neuromorphic implementations of large-scale vision transformers with practical impact for real-time, low-power AI systems, and provides open-source code for reproducibility. The key theoretical contributions include formal proofs (Theorem 1 and Theorem 2) linking SNN outputs to ANN expectations and a matrix-product EC framework, together with an engineered hardware-aware neuron design to enable practical deployment.

Abstract

Spiking neural networks (SNNs) show great potential due to their energy efficiency, fast processing capabilities, and robustness. There are two main approaches to constructing SNNs. Direct training methods require much memory, while conversion methods offer a simpler and more efficient option. However, current conversion methods mainly focus on converting convolutional neural networks (CNNs) to SNNs. Converting Transformers to SNN is challenging because of the presence of non-linear modules. In this paper, we propose an Expectation Compensation Module to preserve the accuracy of the conversion. The core idea is to use information from the previous T time-steps to calculate the expected output at time-step T. We also propose a Multi-Threshold Neuron and the corresponding Parallel Parameter normalization to address the challenge of large time steps needed for high accuracy, aiming to reduce network latency and power consumption. Our experimental results demonstrate that our approach achieves state-of-the-art performance. For example, we achieve a top-1 accuracy of 88.60\% with only a 1\% loss in accuracy using 4 time steps while consuming only 35\% of the original power of the Transformer. To our knowledge, this is the first successful Artificial Neural Network (ANN) to SNN conversion for Spiking Transformers that achieves high accuracy, low latency, and low power consumption on complex datasets. The source codes of the proposed method are available at https://github.com/h-z-h-cell/Transformer-to-SNN-ECMT.

Paper Structure

This paper contains 37 sections, 4 theorems, 26 equations, 6 figures, 7 tables, 1 algorithm.

Key Result

theorem 1

Consider a non-linear layer $l$ with a function $F$. In SNNs, the output of this layer at time $t$ is denoted as $\bm{O}^{l}(t)$. Let $\bm{S}^l(T)$ be the cumulative sum of layer $l$ outputs up to time $T$, given by $\bm{S}^l(T)=\sum_{t=1}^T\bm{O}^{l}(t)$. The expected output of the SNNs at time $T$

Figures (6)

  • Figure 1: An overview of the proposed architecture, including the whole architecture, Attention, and MLP module.
  • Figure 2: The upper diagram shows the general Expectation Compensation module(EC). The lower diagram shows the Expectation Compensation module for Matrix Product(Matrix Product-EC).
  • Figure 3: Diagram of MT neuron. MT neuron receives input from nonlinear/linear modules and emits up to one spike.
  • Figure 4: Left: Original connection in ANN. Right: Parallel Parameter normalization of MT neuron in SNN. The MT Neuron extends one connection to $2n$ channels. At each time, only one of the $2n$ channels can emit a spike.
  • Figure 5: Accuracy under different number and size of thresholds on ViT-S/16, $2n$ denotes the number of thresholds.
  • ...and 1 more figures

Theorems & Definitions (4)

  • theorem 1
  • theorem 2
  • theorem 3
  • theorem 4