Automotive Object Detection via Learning Sparse Events by Spiking Neurons

Hu Zhang; Yanchen Li; Luziwei Leng; Kaiwei Che; Qian Liu; Qinghai Guo; Jianxing Liao; Ran Cheng

Automotive Object Detection via Learning Sparse Events by Spiking Neurons

Hu Zhang, Yanchen Li, Luziwei Leng, Kaiwei Che, Qian Liu, Qinghai Guo, Jianxing Liao, Ran Cheng

TL;DR

This work tackles automotive object detection using event-based sensors by introducing SpikeFPN, a spiking feature pyramid network built on threshold-adaptive spiking neurons. Through surrogate-gradient training and a spike-driven encoder–FPN–head architecture, it achieves strong mAP on GEN1 GAD while maintaining sparse, energy-efficient inference. The key contributions include a spike-adaptive threshold mechanism, a dedicated spiking backbone with a multi-scale feature pyramid, and comprehensive ablations that reveal the benefits of SBT input encoding and ALIF neurons for robust, real-time detection. The results demonstrate that SpikeFPN outperforms selected SNN and attention-enhanced ANN baselines, offering practical benefits for low-power, low-latency automotive perception in event-driven systems.

Abstract

Event-based sensors, distinguished by their high temporal resolution of 1 $\mathrmμ\text{s}$ and a dynamic range of 120 $\text{dB}$, stand out as ideal tools for deployment in fast-paced settings like vehicles and drones. Traditional object detection techniques that utilize Artificial Neural Networks (ANNs) face challenges due to the sparse and asynchronous nature of the events these sensors capture. In contrast, Spiking Neural Networks (SNNs) offer a promising alternative, providing a temporal representation that is inherently aligned with event-based data. This paper explores the unique membrane potential dynamics of SNNs and their ability to modulate sparse events. We introduce an innovative spike-triggered adaptive threshold mechanism designed for stable training. Building on these insights, we present a specialized spiking feature pyramid network (SpikeFPN) optimized for automotive event-based object detection. Comprehensive evaluations demonstrate that SpikeFPN surpasses both traditional SNNs and advanced ANNs enhanced with attention mechanisms. Evidently, SpikeFPN achieves a mean Average Precision (mAP) of 0.477 on the GEN1 Automotive Detection (GAD) benchmark dataset, marking significant increases over the selected SNN baselines. Moreover, the efficient design of SpikeFPN ensures robust performance while optimizing computational resources, attributed to its innate sparse computation capabilities. Source codes are publicly accessible at https://github.com/EMI-Group/spikefpn.

Automotive Object Detection via Learning Sparse Events by Spiking Neurons

TL;DR

Abstract

Event-based sensors, distinguished by their high temporal resolution of 1

and a dynamic range of 120

, stand out as ideal tools for deployment in fast-paced settings like vehicles and drones. Traditional object detection techniques that utilize Artificial Neural Networks (ANNs) face challenges due to the sparse and asynchronous nature of the events these sensors capture. In contrast, Spiking Neural Networks (SNNs) offer a promising alternative, providing a temporal representation that is inherently aligned with event-based data. This paper explores the unique membrane potential dynamics of SNNs and their ability to modulate sparse events. We introduce an innovative spike-triggered adaptive threshold mechanism designed for stable training. Building on these insights, we present a specialized spiking feature pyramid network (SpikeFPN) optimized for automotive event-based object detection. Comprehensive evaluations demonstrate that SpikeFPN surpasses both traditional SNNs and advanced ANNs enhanced with attention mechanisms. Evidently, SpikeFPN achieves a mean Average Precision (mAP) of 0.477 on the GEN1 Automotive Detection (GAD) benchmark dataset, marking significant increases over the selected SNN baselines. Moreover, the efficient design of SpikeFPN ensures robust performance while optimizing computational resources, attributed to its innate sparse computation capabilities. Source codes are publicly accessible at https://github.com/EMI-Group/spikefpn.

Paper Structure (23 sections, 6 equations, 10 figures, 9 tables)

This paper contains 23 sections, 6 equations, 10 figures, 9 tables.

Introduction
Background
Event-based Object Detection
Deep SNNs for Vision Tasks
Feature Pyramid Networks
Method
Event Data Representation
Network Design
Network Architecture
Neural Behavior and Adaptation
Network Training
Experiments
Experiment on GAD
Input Data Representation
Hyper-Parameters Setting
...and 8 more sections

Figures (10)

Figure 1: Overview of the proposed spiking feature pyramid network. The design encompasses an encoder backbone facilitated by a multi-stage spiking network. Different stages of the backbone contribute to the integrated spiking feature pyramid. Subsequent to this, a multi-head prediction module processes the feature pyramid's output through parallel spiking convolution layers, culminating in the generation of multiple prediction boxes. These are ultimately refined using the non-maximum suppression (NMS) method. Notably, the entire network operates on spike-based computation, allowing the advantage of multiplication-free inference.
Figure 2: Overview of the proposed encoder backbone. The structure adopts a multi-stage downsampling scheme, incorporating multiple cells interlinked via directed acyclic graphs. The vertical annotations on the left delineate the downsampling scale associated with each cell structure, while the numbers atop the left side correspond to the cell's subscript, aligning with the detailed architecture showcased in Table \ref{['tab:SpikeFPN']}. Two spiking convolution layers, termed as stem layers, facilitate initial channel variations. The unique cell connection topology recurs across different layers. Operations within each cell, executed at the node level, accumulate on the membrane potential tier, culminating in the form of spikes.
Figure 3: The prediction results of SpikeFPN (left column) and the corresponding ground truth (right column). The "Ped" in each cell stands for the category of pedestrian.
Figure 4: Comparisons of training losses and validation mAPs of SNNs using LIF and ALIF neurons for the first layer.
Figure 5: The performance comparison of the "SBT + LIF" versus "SBT + ALIF" design solutions on the GEN1 Automotive Detection testing dataset with different membrane thresholds.
...and 5 more figures

Automotive Object Detection via Learning Sparse Events by Spiking Neurons

TL;DR

Abstract

Automotive Object Detection via Learning Sparse Events by Spiking Neurons

Authors

TL;DR

Abstract

Table of Contents

Figures (10)