Binary Event-Driven Spiking Transformer

Honglin Cao; Zijian Zhou; Wenjie Wei; Ammar Belatreche; Yu Liang; Dehao Zhang; Malu Zhang; Yang Yang; Haizhou Li

Binary Event-Driven Spiking Transformer

Honglin Cao, Zijian Zhou, Wenjie Wei, Ammar Belatreche, Yu Liang, Dehao Zhang, Malu Zhang, Yang Yang, Haizhou Li

TL;DR

BESTformer introduces 1-bit weight and attention binarization to create a compact transformer-based SNN. The key challenge is information loss from binarization, which the Coupled Information Enhancement (CIE) framework addresses via a reversible encoder and information-enhanced distillation. Empirical results across CIFAR-10/100, CIFAR10-DVS, and ImageNet-1k show state-of-the-art performance among binary SNNs with substantial reductions in model size and NS-ACE, enabling edge-device deployment. The work highlights a viable path for high-performance, ultra-efficient SNNs at scale.

Abstract

Transformer-based Spiking Neural Networks (SNNs) introduce a novel event-driven self-attention paradigm that combines the high performance of Transformers with the energy efficiency of SNNs. However, the larger model size and increased computational demands of the Transformer structure limit their practicality in resource-constrained scenarios. In this paper, we integrate binarization techniques into Transformer-based SNNs and propose the Binary Event-Driven Spiking Transformer, i.e. BESTformer. The proposed BESTformer can significantly reduce storage and computational demands by representing weights and attention maps with a mere 1-bit. However, BESTformer suffers from a severe performance drop from its full-precision counterpart due to the limited representation capability of binarization. To address this issue, we propose a Coupled Information Enhancement (CIE) method, which consists of a reversible framework and information enhancement distillation. By maximizing the mutual information between the binary model and its full-precision counterpart, the CIE method effectively mitigates the performance degradation of the BESTformer. Extensive experiments on static and neuromorphic datasets demonstrate that our method achieves superior performance to other binary SNNs, showcasing its potential as a compact yet high-performance model for resource-limited edge devices. The repository of this paper is available at https://github.com/CaoHLin/BESTFormer.

Binary Event-Driven Spiking Transformer

TL;DR

Abstract

Paper Structure (23 sections, 15 equations, 5 figures, 4 tables)

This paper contains 23 sections, 15 equations, 5 figures, 4 tables.

Introduction
Related Works
Transformer-based SNNs
Quantization techniques in SNNs
Method
Binary Event-Driven Spiking Transformer
Weight binarization
Attention binarization
Challenge analysis
Coupled information enhancement BESTformer
Reversible framework
Proposition 1.
Proposition 2.
Information enhanced distillation
Experiments
...and 8 more sections

Figures (5)

Figure 1: Accuracy vs. NS-ACE & Model Size. Our method achieves superior computational and storage efficiency while outperforming other quantized SNNs on ImageNet. Neuromorphic Synaptic Arithmetic Computation Effort (NS-ACE) assesses SNN resource use in neuromorphic computing environments shen2024conventional.
Figure 2: The 'LIF-B-Conv-BN' structure of BESTformer and representation capability of variables in the structure. Value set indicates the collection of all values present in a variable. Set size indicates the size of a value set.
Figure 3: Overview of our BESTformer with the Coupled Information Enhancement method, which consists of a Binary Spiking Patch Splitting Module(BSPS), Reversible Binary Spiking Transformer Encoder Blocks, Classification and Distillation Heads.
Figure 4: Illustration of the forward and inverse process of the proposed reversible framework. The inverse process indicates that the inputs can be reconstructed from the outputs, i.e., this framework is reversible and no information is lost.
Figure 5: Ablation study for CIE method on CIFAR-100. (a) Impact of CIE on model accuracy across different architectures. (b) A comparative analysis of information representation capability: evaluating models with and without CIE method across variable architecture depths.

Binary Event-Driven Spiking Transformer

TL;DR

Abstract

Binary Event-Driven Spiking Transformer

Authors

TL;DR

Abstract

Table of Contents

Figures (5)