ReSpike: Residual Frames-based Hybrid Spiking Neural Networks for Efficient Action Recognition
Shiting Xiao, Yuhang Li, Youngeun Kim, Donghyun Lee, Priyadarshini Panda
TL;DR
ReSpike addresses the challenge of energy-efficient action recognition by integrating Spiking Neural Networks (SNNs) with Artificial Neural Networks (ANNs) through a novel Key-Residual input representation. The method assigns spatial learning to the ANN via RGB Key Frames and temporal dynamics to the SNN via Residual Frames, then fuses them with a multi-scale cross-attention mechanism. It achieves state-of-the-art or competitive accuracy on HMDB-51, UCF-101, and Kinetics-400, including a first direct-SNN result on Kinetics-400 with 70.1% accuracy, while delivering favorable energy-accuracy trade-offs (up to ~6.8x energy reduction over 3D CNN baselines). The approach is trained end-to-end with Spatio-Temporal Back-Propagation and surrogate gradients, and ablation studies with attention-map visualizations corroborate the effectiveness of key-residual representations and cross-modal fusion for dynamic scene understanding.
Abstract
Spiking Neural Networks (SNNs) have emerged as a compelling, energy-efficient alternative to traditional Artificial Neural Networks (ANNs) for static image tasks such as image classification and segmentation. However, in the more complex video classification domain, SNN-based methods fall considerably short of ANN-based benchmarks due to the challenges in processing dense frame sequences. To bridge this gap, we propose ReSpike, a hybrid framework that synergizes the strengths of ANNs and SNNs to tackle action recognition tasks with high accuracy and low energy cost. By decomposing film clips into spatial and temporal components, i.e., RGB image Key Frames and event-like Residual Frames, ReSpike leverages ANN for learning spatial information and SNN for learning temporal information. In addition, we propose a multi-scale cross-attention mechanism for effective feature fusion. Compared to state-of-the-art SNN baselines, our ReSpike hybrid architecture demonstrates significant performance improvements (e.g., >30% absolute accuracy improvement on HMDB-51, UCF-101, and Kinetics-400). Furthermore, ReSpike achieves comparable performance with prior ANN approaches while bringing better accuracy-energy tradeoff.
