Table of Contents
Fetching ...

RTFormer: Re-parameter TSBN Spiking Transformer

Hongzhi Wang, Xiubo Liang, Mengjian Li, Tao Zhang

TL;DR

RTFormer addresses the challenge of achieving high accuracy in Spiking Neural Networks without sacrificing energy efficiency on neuromorphic hardware. It introduces a Spatial-Temporal Core that combines structurally reparameterized depthwise convolutions with a Temporal Sliding Batch Normalization (TSBN) that integrates into neuron thresholds, yielding an energy-efficient Spiking Transformer. The approach delivers state-of-the-art-like performance on ImageNet and CIFAR-10/100 while reducing energy consumption, and demonstrates strong results on neuromorphic datasets (CIFAR10-DVS, DVS128 Gesture) due to enhanced temporal processing. The work provides detailed energy analysis and ablations, underscoring the practical impact of TSBN and reparameterized spatial blocks for real-world neuromorphic deployment.

Abstract

The Spiking Neural Networks (SNNs), renowned for their bio-inspired operational mechanism and energy efficiency, mirror the human brain's neural activity. Yet, SNNs face challenges in balancing energy efficiency with the computational demands of advanced tasks. Our research introduces the RTFormer, a novel architecture that embeds Re-parameterized Temporal Sliding Batch Normalization (TSBN) within the Spiking Transformer framework. This innovation optimizes energy usage during inference while ensuring robust computational performance. The crux of RTFormer lies in its integration of reparameterized convolutions and TSBN, achieving an equilibrium between computational prowess and energy conservation.

RTFormer: Re-parameter TSBN Spiking Transformer

TL;DR

RTFormer addresses the challenge of achieving high accuracy in Spiking Neural Networks without sacrificing energy efficiency on neuromorphic hardware. It introduces a Spatial-Temporal Core that combines structurally reparameterized depthwise convolutions with a Temporal Sliding Batch Normalization (TSBN) that integrates into neuron thresholds, yielding an energy-efficient Spiking Transformer. The approach delivers state-of-the-art-like performance on ImageNet and CIFAR-10/100 while reducing energy consumption, and demonstrates strong results on neuromorphic datasets (CIFAR10-DVS, DVS128 Gesture) due to enhanced temporal processing. The work provides detailed energy analysis and ablations, underscoring the practical impact of TSBN and reparameterized spatial blocks for real-world neuromorphic deployment.

Abstract

The Spiking Neural Networks (SNNs), renowned for their bio-inspired operational mechanism and energy efficiency, mirror the human brain's neural activity. Yet, SNNs face challenges in balancing energy efficiency with the computational demands of advanced tasks. Our research introduces the RTFormer, a novel architecture that embeds Re-parameterized Temporal Sliding Batch Normalization (TSBN) within the Spiking Transformer framework. This innovation optimizes energy usage during inference while ensuring robust computational performance. The crux of RTFormer lies in its integration of reparameterized convolutions and TSBN, achieving an equilibrium between computational prowess and energy conservation.
Paper Structure (17 sections, 6 equations, 3 figures, 3 tables)

This paper contains 17 sections, 6 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Illustration of the structural reparameterization and simplification from a complex multi-branch system to a streamlined model. The top dashed box represents the parameters of the TSBN incorporated into Conv3, and the bottom dashed box represents the parameters of the TSBN incorporated into the trainable threshold.
  • Figure 2: Display of a comparative visualization across four columns for a series of images. Visualization comprises original images, Grad-CAM representations, Attention Maps , and Spiking Fire Rate (SFR) maps.
  • Figure 3: The figure presents two line graphs, where the blue line represents the baseline model, and the green line indicates the performance after incorporating TSBN. The graph on the left illustrates the results obtained on the CIFAR-10 dataset, while the right graph showcases the outcomes on the CIFAR10-DVS dataset.