Table of Contents
Fetching ...

Spiking Wavelet Transformer

Yuetong Fang, Ziqing Wang, Lingfeng Zhang, Jiahang Cao, Honglei Chen, Renjing Xu

TL;DR

The Spiking Wavelet Transformer (SWformer) is proposed, an attention-free architecture that effectively learns comprehensive spatial-frequency features in a spike-driven manner by leveraging the sparse wavelet transform, outperforming state-of-the-art SNNs.

Abstract

Spiking neural networks (SNNs) offer an energy-efficient alternative to conventional deep learning by emulating the event-driven processing manner of the brain. Incorporating Transformers with SNNs has shown promise for accuracy. However, they struggle to learn high-frequency patterns, such as moving edges and pixel-level brightness changes, because they rely on the global self-attention mechanism. Learning these high-frequency representations is challenging but essential for SNN-based event-driven vision. To address this issue, we propose the Spiking Wavelet Transformer (SWformer), an attention-free architecture that effectively learns comprehensive spatial-frequency features in a spike-driven manner by leveraging the sparse wavelet transform. The critical component is a Frequency-Aware Token Mixer (FATM) with three branches: 1) spiking wavelet learner for spatial-frequency domain learning, 2) convolution-based learner for spatial feature extraction, and 3) spiking pointwise convolution for cross-channel information aggregation - with negative spike dynamics incorporated in 1) to enhance frequency representation. The FATM enables the SWformer to outperform vanilla Spiking Transformers in capturing high-frequency visual components, as evidenced by our empirical results. Experiments on both static and neuromorphic datasets demonstrate SWformer's effectiveness in capturing spatial-frequency patterns in a multiplication-free and event-driven fashion, outperforming state-of-the-art SNNs. SWformer achieves a 22.03% reduction in parameter count, and a 2.52% performance improvement on the ImageNet dataset compared to vanilla Spiking Transformers. The code is available at: https://github.com/bic-L/Spiking-Wavelet-Transformer.

Spiking Wavelet Transformer

TL;DR

The Spiking Wavelet Transformer (SWformer) is proposed, an attention-free architecture that effectively learns comprehensive spatial-frequency features in a spike-driven manner by leveraging the sparse wavelet transform, outperforming state-of-the-art SNNs.

Abstract

Spiking neural networks (SNNs) offer an energy-efficient alternative to conventional deep learning by emulating the event-driven processing manner of the brain. Incorporating Transformers with SNNs has shown promise for accuracy. However, they struggle to learn high-frequency patterns, such as moving edges and pixel-level brightness changes, because they rely on the global self-attention mechanism. Learning these high-frequency representations is challenging but essential for SNN-based event-driven vision. To address this issue, we propose the Spiking Wavelet Transformer (SWformer), an attention-free architecture that effectively learns comprehensive spatial-frequency features in a spike-driven manner by leveraging the sparse wavelet transform. The critical component is a Frequency-Aware Token Mixer (FATM) with three branches: 1) spiking wavelet learner for spatial-frequency domain learning, 2) convolution-based learner for spatial feature extraction, and 3) spiking pointwise convolution for cross-channel information aggregation - with negative spike dynamics incorporated in 1) to enhance frequency representation. The FATM enables the SWformer to outperform vanilla Spiking Transformers in capturing high-frequency visual components, as evidenced by our empirical results. Experiments on both static and neuromorphic datasets demonstrate SWformer's effectiveness in capturing spatial-frequency patterns in a multiplication-free and event-driven fashion, outperforming state-of-the-art SNNs. SWformer achieves a 22.03% reduction in parameter count, and a 2.52% performance improvement on the ImageNet dataset compared to vanilla Spiking Transformers. The code is available at: https://github.com/bic-L/Spiking-Wavelet-Transformer.
Paper Structure (26 sections, 8 equations, 6 figures, 4 tables)

This paper contains 26 sections, 8 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: (a) Performance of SWformer and other SOTA SNN models in top-1 accuracy and energy consumption (detail in supplementary), with marker size reflecting model size. (b) Fourier spectrum comparison between the Spiking Transformer with global attention yao2023spike (top) and SWformer (bottom). Brighter colors indicate higher magnitudes. (c) Corresponding relative log amplitudes of Fourier-transformed feature maps. (b-c) show SWformer captures more high-frequency signals, leading to better performance.
  • Figure 2: Processing flow of a synapse block. Neuromorphic chips follow a spike-based computation paradigm, where both inputs and outputs are in spike form. davies2018loihi
  • Figure 3: The overview of SWformer. We present two main innovations inspired by zhou2022spikformer. Firstly, FATM improves frequency perception in Spiking Transformers using only Conv and MLP operations, ensuring compatibility with neuromorphic hardware. Second, our Frequency Learner (FL) efficiently captures spectral features through spiking frequency representation and block-diagonal multiplication. ConvBN: a Conv layer followed by a BN layer.
  • Figure 4: (a) Comparative of the standard Haar transform, binary spiking Haar transform, and ternary spiking Haar transform. Higher Peak Signal-to-Noise Ratio values indicate greater similarity between the images. (b) Schematic of block-diagonal matrix.
  • Figure 5: Mainstream shortcut schemes in SNNs.
  • ...and 1 more figures