Table of Contents
Fetching ...

Temporal Dynamics Enhancer for Directly Trained Spiking Object Detectors

Fan Luo, Zeyu Gao, Xinhao Luo, Kai Zhao, Yanfeng Lu

TL;DR

The paper identifies a bottleneck in temporal modeling for directly trained SNN object detectors and proposes the Temporal Dynamics Enhancer (TDE), composed of a Spiking Encoder (SE), an Attention Gating Module (AGM), and Spike-Driven Attention (SDA) to enable richer temporal dynamics while preserving spike-powered efficiency. By integrating TDE with existing SNN detectors, the authors demonstrate consistent improvements on static (VOC) and neuromorphic (EvDET200K) datasets, with substantial energy savings from the SDA design. The approach is shown to be generally applicable across multiple detectors with modest parameter overhead, signaling a practical path toward more expressive and energy-efficient spike-based vision. Collectively, this work advances spike-driven temporal processing in SNNs and provides a scalable framework for neuromorphic object detection.

Abstract

Spiking Neural Networks (SNNs), with their brain-inspired spatiotemporal dynamics and spike-driven computation, have emerged as promising energy-efficient alternatives to Artificial Neural Networks (ANNs). However, existing SNNs typically replicate inputs directly or aggregate them into frames at fixed intervals. Such strategies lead to neurons receiving nearly identical stimuli across time steps, severely limiting the model's expressive power, particularly in complex tasks like object detection. In this work, we propose the Temporal Dynamics Enhancer (TDE) to strengthen SNNs' capacity for temporal information modeling. TDE consists of two modules: a Spiking Encoder (SE) that generates diverse input stimuli across time steps, and an Attention Gating Module (AGM) that guides the SE generation based on inter-temporal dependencies. Moreover, to eliminate the high-energy multiplication operations introduced by the AGM, we propose a Spike-Driven Attention (SDA) to reduce attention-related energy consumption. Extensive experiments demonstrate that TDE can be seamlessly integrated into existing SNN-based detectors and consistently outperforms state-of-the-art methods, achieving mAP50-95 scores of 57.7% on the static PASCAL VOC dataset and 47.6% on the neuromorphic EvDET200K dataset. In terms of energy consumption, the SDA consumes only 0.240 times the energy of conventional attention modules.

Temporal Dynamics Enhancer for Directly Trained Spiking Object Detectors

TL;DR

The paper identifies a bottleneck in temporal modeling for directly trained SNN object detectors and proposes the Temporal Dynamics Enhancer (TDE), composed of a Spiking Encoder (SE), an Attention Gating Module (AGM), and Spike-Driven Attention (SDA) to enable richer temporal dynamics while preserving spike-powered efficiency. By integrating TDE with existing SNN detectors, the authors demonstrate consistent improvements on static (VOC) and neuromorphic (EvDET200K) datasets, with substantial energy savings from the SDA design. The approach is shown to be generally applicable across multiple detectors with modest parameter overhead, signaling a practical path toward more expressive and energy-efficient spike-based vision. Collectively, this work advances spike-driven temporal processing in SNNs and provides a scalable framework for neuromorphic object detection.

Abstract

Spiking Neural Networks (SNNs), with their brain-inspired spatiotemporal dynamics and spike-driven computation, have emerged as promising energy-efficient alternatives to Artificial Neural Networks (ANNs). However, existing SNNs typically replicate inputs directly or aggregate them into frames at fixed intervals. Such strategies lead to neurons receiving nearly identical stimuli across time steps, severely limiting the model's expressive power, particularly in complex tasks like object detection. In this work, we propose the Temporal Dynamics Enhancer (TDE) to strengthen SNNs' capacity for temporal information modeling. TDE consists of two modules: a Spiking Encoder (SE) that generates diverse input stimuli across time steps, and an Attention Gating Module (AGM) that guides the SE generation based on inter-temporal dependencies. Moreover, to eliminate the high-energy multiplication operations introduced by the AGM, we propose a Spike-Driven Attention (SDA) to reduce attention-related energy consumption. Extensive experiments demonstrate that TDE can be seamlessly integrated into existing SNN-based detectors and consistently outperforms state-of-the-art methods, achieving mAP50-95 scores of 57.7% on the static PASCAL VOC dataset and 47.6% on the neuromorphic EvDET200K dataset. In terms of energy consumption, the SDA consumes only 0.240 times the energy of conventional attention modules.

Paper Structure

This paper contains 19 sections, 7 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: Research Motivation of the TDE Module. (a) Illustrates the spike firing pattern of the first LIF neuron layer before and after applying TDE. (b) Describes how the LIF neuron in existing SNNs receive repetitive input stimuli, resulting in the disappearance of a series of spike streams. (c) Highlights that traditional attention mechanisms in SNNs introduce a large number of energy-intensive multiplication operations.
  • Figure 2: The Temporal Dynamics Enhancer (TDE) consists of two main components: (1) The spiking encoder (SE), using the CB component (Conv-BN), triggers the generation of diverse spikes. Its connection mechanism is based on the charging equation of the LIF neuron and the firing equation. (2) The Attention Gating Module (AGM) enhances the temporal dynamics of neurons within the layer by utilizing a general multi-dimensional attention mechanism. At the same time, it obtains temporal attention weights to regulate the spike stream generation of the SE, suppressing unreasonable exploration.
  • Figure 3: SDA: Schematic of the Spike-Driven Attention. SDA uses two neuron groups to avoid the multiplication operations involved. It fuses the spike and floating-point temporal, spatial, and channel attention weights with cross-attention to obtain the attention-updated membrane potential, eliminating the need for hadamard multiplication and matrix multiplication operations in the membrane potential update.
  • Figure 4: With TDE, the network gradually shifts attention from the object to surrounding regions over time, while the baseline shows mostly static feature maps.