SpikeZIP-TF: Conversion is All You Need for Transformer-based SNN
Kang You, Zekai Xu, Chen Nie, Zhijie Deng, Qinghai Guo, Xiang Wang, Zhezhi He
TL;DR
Transformer-based SNNs have lagged behind their ANN counterparts in accuracy when converted directly. SpikeZIP-TF achieves near-lossless ANN-to-SNN conversion by introducing spike-equivalent operators—SESA, Spike-Softmax, and Spike-LayerNorm—tied to an ST-BIF+ neuron to preserve quantized-activation equivalence. The approach delivers state-of-the-art results on ImageNet (top-1 $83.82\%$) and SST-2 ( $93.79\%$ ), with ultra-low latency (as few as $8$ time-steps) and favorable power-accuracy trade-offs, while reducing training costs by leveraging pre-trained ANNs. This method enables efficient neuromorphic deployment of Transformer models across computer vision and natural language tasks and offers a practical pathway toward ultra-low-latency, energy-efficient inference.
Abstract
Spiking neural network (SNN) has attracted great attention due to its characteristic of high efficiency and accuracy. Currently, the ANN-to-SNN conversion methods can obtain ANN on-par accuracy SNN with ultra-low latency (8 time-steps) in CNN structure on computer vision (CV) tasks. However, as Transformer-based networks have achieved prevailing precision on both CV and natural language processing (NLP), the Transformer-based SNNs are still encounting the lower accuracy w.r.t the ANN counterparts. In this work, we introduce a novel ANN-to-SNN conversion method called SpikeZIP-TF, where ANN and SNN are exactly equivalent, thus incurring no accuracy degradation. SpikeZIP-TF achieves 83.82% accuracy on CV dataset (ImageNet) and 93.79% accuracy on NLP dataset (SST-2), which are higher than SOTA Transformer-based SNNs. The code is available in GitHub: https://github.com/Intelligent-Computing-Research-Group/SpikeZIP_transformer
