STAA-SNN: Spatial-Temporal Attention Aggregator for Spiking Neural Networks
Tianqing Zhang, Kairong Yu, Xian Zhong, Hongwei Wang, Qi Xu, Qiang Zhang
TL;DR
This work tackles the performance gap between spiking neural networks (SNNs) and artificial neural networks (ANNs) by introducing STAA-SNN, a plug-and-play framework that jointly models spatial and temporal information. It integrates a spike-driven self-attention mechanism, a learnable position encoding, a step attention module, and a time-step random dropout strategy to robustly aggregate spatio-temporal features. The approach yields state-of-the-art results on neuromorphic CIFAR10-DVS and strong accuracy on static datasets (CIFAR-10/100, ImageNet) with fewer timesteps, while also achieving competitive event-based recognition on CIFAR10-DVS and DVS128 Gesture. These contributions advance energy-efficient, temporally-aware SNNs and offer practical benefits for neuromorphic vision tasks, with open-source code facilitating adoption and further research.
Abstract
Spiking Neural Networks (SNNs) have gained significant attention due to their biological plausibility and energy efficiency, making them promising alternatives to Artificial Neural Networks (ANNs). However, the performance gap between SNNs and ANNs remains a substantial challenge hindering the widespread adoption of SNNs. In this paper, we propose a Spatial-Temporal Attention Aggregator SNN (STAA-SNN) framework, which dynamically focuses on and captures both spatial and temporal dependencies. First, we introduce a spike-driven self-attention mechanism specifically designed for SNNs. Additionally, we pioneeringly incorporate position encoding to integrate latent temporal relationships into the incoming features. For spatial-temporal information aggregation, we employ step attention to selectively amplify relevant features at different steps. Finally, we implement a time-step random dropout strategy to avoid local optima. As a result, STAA-SNN effectively captures both spatial and temporal dependencies, enabling the model to analyze complex patterns and make accurate predictions. The framework demonstrates exceptional performance across diverse datasets and exhibits strong generalization capabilities. Notably, STAA-SNN achieves state-of-the-art results on neuromorphic datasets CIFAR10-DVS, with remarkable performances of 97.14%, 82.05% and 70.40% on the static datasets CIFAR-10, CIFAR-100 and ImageNet, respectively. Furthermore, our model exhibits improved performance ranging from 0.33\% to 2.80\% with fewer time steps. The code for the model is available on GitHub.
