Table of Contents
Fetching ...

Driving in Spikes: An Entropy-Guided Object Detector for Spike Cameras

Ziyan Liu, Qi Su, Lulu Tang, Zhaofei Yu, Tiejun Huang

TL;DR

This work tackles object detection for spike cameras in high-speed driving, where traditional frame-based detectors struggle under motion and lighting extremes. It introduces EASD, a dual-branch, end-to-end spike detector that combines a Temporal-Based Texture branch with a global fusion pathway and an Entropy Selective Attention branch for object-centric refinement, enabling robust detection from sparse spike streams. To close the data gap, the authors build DSEC-Spike, a spike-based driving benchmark, and demonstrate state-of-the-art performance on both synthetic spike data and real spike streams, with notable simulation-to-real transfer. The results show that spike cameras, when coupled with carefully designed spatiotemporal and attention mechanisms, can achieve accurate, efficient multi-object detection in challenging autonomous-driving scenarios, highlighting practical viability for ultra-fast perception systems.

Abstract

Object detection in autonomous driving suffers from motion blur and saturation under fast motion and extreme lighting. Spike cameras, offer microsecond latency and ultra high dynamic range for object detection by using per pixel asynchronous integrate and fire. However, their sparse, discrete output cannot be processed by standard image-based detectors, posing a critical challenge for end to end spike stream detection. We propose EASD, an end to end spike camera detector with a dual branch design: a Temporal Based Texture plus Feature Fusion branch for global cross slice semantics, and an Entropy Selective Attention branch for object centric details. To close the data gap, we introduce DSEC Spike, the first driving oriented simulated spike detection benchmark.

Driving in Spikes: An Entropy-Guided Object Detector for Spike Cameras

TL;DR

This work tackles object detection for spike cameras in high-speed driving, where traditional frame-based detectors struggle under motion and lighting extremes. It introduces EASD, a dual-branch, end-to-end spike detector that combines a Temporal-Based Texture branch with a global fusion pathway and an Entropy Selective Attention branch for object-centric refinement, enabling robust detection from sparse spike streams. To close the data gap, the authors build DSEC-Spike, a spike-based driving benchmark, and demonstrate state-of-the-art performance on both synthetic spike data and real spike streams, with notable simulation-to-real transfer. The results show that spike cameras, when coupled with carefully designed spatiotemporal and attention mechanisms, can achieve accurate, efficient multi-object detection in challenging autonomous-driving scenarios, highlighting practical viability for ultra-fast perception systems.

Abstract

Object detection in autonomous driving suffers from motion blur and saturation under fast motion and extreme lighting. Spike cameras, offer microsecond latency and ultra high dynamic range for object detection by using per pixel asynchronous integrate and fire. However, their sparse, discrete output cannot be processed by standard image-based detectors, posing a critical challenge for end to end spike stream detection. We propose EASD, an end to end spike camera detector with a dual branch design: a Temporal Based Texture plus Feature Fusion branch for global cross slice semantics, and an Entropy Selective Attention branch for object centric details. To close the data gap, we introduce DSEC Spike, the first driving oriented simulated spike detection benchmark.

Paper Structure

This paper contains 12 sections, 12 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Illustration comparing motion blur effects in conventional cameras and spike cameras under high-speed conditions. While conventional cameras suffer from missed detections due to blur, spike cameras preserve temporal fidelity. This figure is for visualization purposes only.
  • Figure 2: Illustration of the working mechanism of spike cameras, highlighting how they convert intensity changes into asynchronous spike signals.
  • Figure 3: Overview of EASD. A dual-branch architecture: an upper branch that aggregates cross‑slice global texture and semantics via a Temporal‑Based Texture Module and a Feature Fusion Module; a lower branch that enhances object‑centric cues through an Entropy Selective Attention Module, adaptively focusing on regions likely to contain targets.
  • Figure 4: Entropy Selective Attention Module. The entropy block partitions the feature map into foreground (orange mask) and background regions (green mask). The foreground regions utilize deformable window attention for enhancement, whereas the other regions undergo simple convolution.
  • Figure 5: Entropy Block. The first step computes the entropy value for each window in the feature map and merges adjacent windows to calculate the average entropy. The second step selects foreground windows from the merged windows based on a specific entropy range. Squares with different colors represent different entropy levels, while the red-bordered square indicates the selected window.
  • ...and 2 more figures