Spiking Wavelet Transformer

Yuetong Fang; Ziqing Wang; Lingfeng Zhang; Jiahang Cao; Honglei Chen; Renjing Xu

Spiking Wavelet Transformer

Yuetong Fang, Ziqing Wang, Lingfeng Zhang, Jiahang Cao, Honglei Chen, Renjing Xu

TL;DR

The Spiking Wavelet Transformer (SWformer) is proposed, an attention-free architecture that effectively learns comprehensive spatial-frequency features in a spike-driven manner by leveraging the sparse wavelet transform, outperforming state-of-the-art SNNs.

Abstract

Spiking neural networks (SNNs) offer an energy-efficient alternative to conventional deep learning by emulating the event-driven processing manner of the brain. Incorporating Transformers with SNNs has shown promise for accuracy. However, they struggle to learn high-frequency patterns, such as moving edges and pixel-level brightness changes, because they rely on the global self-attention mechanism. Learning these high-frequency representations is challenging but essential for SNN-based event-driven vision. To address this issue, we propose the Spiking Wavelet Transformer (SWformer), an attention-free architecture that effectively learns comprehensive spatial-frequency features in a spike-driven manner by leveraging the sparse wavelet transform. The critical component is a Frequency-Aware Token Mixer (FATM) with three branches: 1) spiking wavelet learner for spatial-frequency domain learning, 2) convolution-based learner for spatial feature extraction, and 3) spiking pointwise convolution for cross-channel information aggregation - with negative spike dynamics incorporated in 1) to enhance frequency representation. The FATM enables the SWformer to outperform vanilla Spiking Transformers in capturing high-frequency visual components, as evidenced by our empirical results. Experiments on both static and neuromorphic datasets demonstrate SWformer's effectiveness in capturing spatial-frequency patterns in a multiplication-free and event-driven fashion, outperforming state-of-the-art SNNs. SWformer achieves a 22.03% reduction in parameter count, and a 2.52% performance improvement on the ImageNet dataset compared to vanilla Spiking Transformers. The code is available at: https://github.com/bic-L/Spiking-Wavelet-Transformer.

Spiking Wavelet Transformer

TL;DR

Abstract

Paper Structure (26 sections, 8 equations, 6 figures, 4 tables)

This paper contains 26 sections, 8 equations, 6 figures, 4 tables.

Introduction
Preliminary
Bio-inspired Spiking Neural Networks
Neuromorphic Chips
Spiking Vision Transformers
Learning in the Frequency Domain
Spiking Wavelet Transformer
Overall Architecture
Frequency-Aware Token Mixer
Frequency Learner
Frequency representation in SNNs
Modularized Weight Matrix
Membrane Shortcut
Experiment
Experiment Setup
...and 11 more sections

Figures (6)

Figure 1: (a) Performance of SWformer and other SOTA SNN models in top-1 accuracy and energy consumption (detail in supplementary), with marker size reflecting model size. (b) Fourier spectrum comparison between the Spiking Transformer with global attention yao2023spike (top) and SWformer (bottom). Brighter colors indicate higher magnitudes. (c) Corresponding relative log amplitudes of Fourier-transformed feature maps. (b-c) show SWformer captures more high-frequency signals, leading to better performance.
Figure 2: Processing flow of a synapse block. Neuromorphic chips follow a spike-based computation paradigm, where both inputs and outputs are in spike form. davies2018loihi
Figure 3: The overview of SWformer. We present two main innovations inspired by zhou2022spikformer. Firstly, FATM improves frequency perception in Spiking Transformers using only Conv and MLP operations, ensuring compatibility with neuromorphic hardware. Second, our Frequency Learner (FL) efficiently captures spectral features through spiking frequency representation and block-diagonal multiplication. ConvBN: a Conv layer followed by a BN layer.
Figure 4: (a) Comparative of the standard Haar transform, binary spiking Haar transform, and ternary spiking Haar transform. Higher Peak Signal-to-Noise Ratio values indicate greater similarity between the images. (b) Schematic of block-diagonal matrix.
Figure 5: Mainstream shortcut schemes in SNNs.
...and 1 more figures

Spiking Wavelet Transformer

TL;DR

Abstract

Spiking Wavelet Transformer

Authors

TL;DR

Abstract

Table of Contents

Figures (6)