Table of Contents
Fetching ...

STAA-SNN: Spatial-Temporal Attention Aggregator for Spiking Neural Networks

Tianqing Zhang, Kairong Yu, Xian Zhong, Hongwei Wang, Qi Xu, Qiang Zhang

TL;DR

This work tackles the performance gap between spiking neural networks (SNNs) and artificial neural networks (ANNs) by introducing STAA-SNN, a plug-and-play framework that jointly models spatial and temporal information. It integrates a spike-driven self-attention mechanism, a learnable position encoding, a step attention module, and a time-step random dropout strategy to robustly aggregate spatio-temporal features. The approach yields state-of-the-art results on neuromorphic CIFAR10-DVS and strong accuracy on static datasets (CIFAR-10/100, ImageNet) with fewer timesteps, while also achieving competitive event-based recognition on CIFAR10-DVS and DVS128 Gesture. These contributions advance energy-efficient, temporally-aware SNNs and offer practical benefits for neuromorphic vision tasks, with open-source code facilitating adoption and further research.

Abstract

Spiking Neural Networks (SNNs) have gained significant attention due to their biological plausibility and energy efficiency, making them promising alternatives to Artificial Neural Networks (ANNs). However, the performance gap between SNNs and ANNs remains a substantial challenge hindering the widespread adoption of SNNs. In this paper, we propose a Spatial-Temporal Attention Aggregator SNN (STAA-SNN) framework, which dynamically focuses on and captures both spatial and temporal dependencies. First, we introduce a spike-driven self-attention mechanism specifically designed for SNNs. Additionally, we pioneeringly incorporate position encoding to integrate latent temporal relationships into the incoming features. For spatial-temporal information aggregation, we employ step attention to selectively amplify relevant features at different steps. Finally, we implement a time-step random dropout strategy to avoid local optima. As a result, STAA-SNN effectively captures both spatial and temporal dependencies, enabling the model to analyze complex patterns and make accurate predictions. The framework demonstrates exceptional performance across diverse datasets and exhibits strong generalization capabilities. Notably, STAA-SNN achieves state-of-the-art results on neuromorphic datasets CIFAR10-DVS, with remarkable performances of 97.14%, 82.05% and 70.40% on the static datasets CIFAR-10, CIFAR-100 and ImageNet, respectively. Furthermore, our model exhibits improved performance ranging from 0.33\% to 2.80\% with fewer time steps. The code for the model is available on GitHub.

STAA-SNN: Spatial-Temporal Attention Aggregator for Spiking Neural Networks

TL;DR

This work tackles the performance gap between spiking neural networks (SNNs) and artificial neural networks (ANNs) by introducing STAA-SNN, a plug-and-play framework that jointly models spatial and temporal information. It integrates a spike-driven self-attention mechanism, a learnable position encoding, a step attention module, and a time-step random dropout strategy to robustly aggregate spatio-temporal features. The approach yields state-of-the-art results on neuromorphic CIFAR10-DVS and strong accuracy on static datasets (CIFAR-10/100, ImageNet) with fewer timesteps, while also achieving competitive event-based recognition on CIFAR10-DVS and DVS128 Gesture. These contributions advance energy-efficient, temporally-aware SNNs and offer practical benefits for neuromorphic vision tasks, with open-source code facilitating adoption and further research.

Abstract

Spiking Neural Networks (SNNs) have gained significant attention due to their biological plausibility and energy efficiency, making them promising alternatives to Artificial Neural Networks (ANNs). However, the performance gap between SNNs and ANNs remains a substantial challenge hindering the widespread adoption of SNNs. In this paper, we propose a Spatial-Temporal Attention Aggregator SNN (STAA-SNN) framework, which dynamically focuses on and captures both spatial and temporal dependencies. First, we introduce a spike-driven self-attention mechanism specifically designed for SNNs. Additionally, we pioneeringly incorporate position encoding to integrate latent temporal relationships into the incoming features. For spatial-temporal information aggregation, we employ step attention to selectively amplify relevant features at different steps. Finally, we implement a time-step random dropout strategy to avoid local optima. As a result, STAA-SNN effectively captures both spatial and temporal dependencies, enabling the model to analyze complex patterns and make accurate predictions. The framework demonstrates exceptional performance across diverse datasets and exhibits strong generalization capabilities. Notably, STAA-SNN achieves state-of-the-art results on neuromorphic datasets CIFAR10-DVS, with remarkable performances of 97.14%, 82.05% and 70.40% on the static datasets CIFAR-10, CIFAR-100 and ImageNet, respectively. Furthermore, our model exhibits improved performance ranging from 0.33\% to 2.80\% with fewer time steps. The code for the model is available on GitHub.

Paper Structure

This paper contains 44 sections, 11 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Overview of the STAA-SNN Architecture and the TSRD strategy.
  • Figure 2: Position encoding locations in SNNs.
  • Figure 3: Distribution of accuracy with different dropout probability $\beta$ in TSRD on CIFAR-10.
  • Figure 4: Visualization on CIFAR10-DVS. Ten layers from VGG-13 in a shallow to deep manner.
  • Figure 5: Impact of different scaling coefficients $r$ for intermediate feature dimensions in the GC module on the CIFAR-10 dataset.