Temporal Dynamics Enhancer for Directly Trained Spiking Object Detectors
Fan Luo, Zeyu Gao, Xinhao Luo, Kai Zhao, Yanfeng Lu
TL;DR
The paper identifies a bottleneck in temporal modeling for directly trained SNN object detectors and proposes the Temporal Dynamics Enhancer (TDE), composed of a Spiking Encoder (SE), an Attention Gating Module (AGM), and Spike-Driven Attention (SDA) to enable richer temporal dynamics while preserving spike-powered efficiency. By integrating TDE with existing SNN detectors, the authors demonstrate consistent improvements on static (VOC) and neuromorphic (EvDET200K) datasets, with substantial energy savings from the SDA design. The approach is shown to be generally applicable across multiple detectors with modest parameter overhead, signaling a practical path toward more expressive and energy-efficient spike-based vision. Collectively, this work advances spike-driven temporal processing in SNNs and provides a scalable framework for neuromorphic object detection.
Abstract
Spiking Neural Networks (SNNs), with their brain-inspired spatiotemporal dynamics and spike-driven computation, have emerged as promising energy-efficient alternatives to Artificial Neural Networks (ANNs). However, existing SNNs typically replicate inputs directly or aggregate them into frames at fixed intervals. Such strategies lead to neurons receiving nearly identical stimuli across time steps, severely limiting the model's expressive power, particularly in complex tasks like object detection. In this work, we propose the Temporal Dynamics Enhancer (TDE) to strengthen SNNs' capacity for temporal information modeling. TDE consists of two modules: a Spiking Encoder (SE) that generates diverse input stimuli across time steps, and an Attention Gating Module (AGM) that guides the SE generation based on inter-temporal dependencies. Moreover, to eliminate the high-energy multiplication operations introduced by the AGM, we propose a Spike-Driven Attention (SDA) to reduce attention-related energy consumption. Extensive experiments demonstrate that TDE can be seamlessly integrated into existing SNN-based detectors and consistently outperforms state-of-the-art methods, achieving mAP50-95 scores of 57.7% on the static PASCAL VOC dataset and 47.6% on the neuromorphic EvDET200K dataset. In terms of energy consumption, the SDA consumes only 0.240 times the energy of conventional attention modules.
