Deep Event-based Object Detection in Autonomous Driving: A Survey

Bingquan Zhou; Jie Jiang

Deep Event-based Object Detection in Autonomous Driving: A Survey

Bingquan Zhou, Jie Jiang

TL;DR

The paper addresses object detection for autonomous driving using event cameras, which provide ultra-low latency and high dynamic range but pose challenges due to asynchronous, sparse data. It surveys four methodological families—DNNs, GNNs, SNNs, and multi-modal fusion—along with representations (event frames, voxel grids, and learnable encodings) and domain-specific detectors inspired by frame-based architectures, graph processing, and spike-based computing. It also catalogs event-only and multi-modal datasets, highlighting annotation strategies and the trade-offs of different data modalities. The study emphasizes that while event-based perception is promising for low-latency, robust driving, progress hinges on advances in memory-efficient architectures, specialized hardware, and effective cross-modal fusion to handle static scenes and texture inference, ultimately enabling practical deployment in real-time autonomous systems.

Abstract

Object detection plays a critical role in autonomous driving, where accurately and efficiently detecting objects in fast-moving scenes is crucial. Traditional frame-based cameras face challenges in balancing latency and bandwidth, necessitating the need for innovative solutions. Event cameras have emerged as promising sensors for autonomous driving due to their low latency, high dynamic range, and low power consumption. However, effectively utilizing the asynchronous and sparse event data presents challenges, particularly in maintaining low latency and lightweight architectures for object detection. This paper provides an overview of object detection using event data in autonomous driving, showcasing the competitive benefits of event cameras.

Deep Event-based Object Detection in Autonomous Driving: A Survey

TL;DR

Abstract

Paper Structure (25 sections, 2 equations, 5 figures, 1 table)

This paper contains 25 sections, 2 equations, 5 figures, 1 table.

Introduction
Event Camera and Event Data
Event camera
Event Data Structure
Event Data Preprocessing
Event Frame (Event Image)
Voxel Grid
Learnable Representation
Methodology for Event-Based Object Detection
DNN for Event-Based Object Detection
YOLO Based Method
LSTM Based Method
Attention Based Method
Point Cloud Inspired Method
GNN for Event-Based Object Detection
...and 10 more sections

Figures (5)

Figure 1: Comparison between RGB and Event Cameras. (a) In low-light conditions, RGB cameras fail to distinguish objects from the background due to insufficient light capture, whereas event cameras excel at detecting object edges. (b) RGB cameras' limited dynamic range falls short in high-intensity scenes, unlike event cameras with their high dynamic range for clear object detection. (c) Motion blur in RGB imaging during fast movements is avoided with event cameras, which offer low-latency, blur-free detection.
Figure 2: Comparative diagram of event camera circuitry and visual nerves(adapted from event_hardware). Here, the circuit diagram mimics three parts of the vision sensing nerve: the photoreceptors are responsible for converting signals into electrochemical signals that can be transmitted by neurons, the ganglion cells receive the electrochemical signals, and then these electrochemical signals continue to be transmitted along the optic nerve, ultimately becoming a binary signal.
Figure 3: Summary diagram of object detection technology for autonomous driving scenarios based on event cameras and deep learning. It can be broadly divided into four technical approaches: based on traditional deep neural networks, based on graph neural networks, based on spiking neural networks, and based on multi-modal fusion. At the same time, large object detection datasets based on event cameras in driving scenarios is also proposed to support these technologies.
Figure 4: Different Method in Event-Based Object Detection. (a) Deep Neural Networks (DNNs) for object detection in event data are often applied after transforming raw events into an image-like format, facilitating the use of established models like YOLO. (b) Exploiting the spatio-temporal content of event data is enhanced by modeling it as a graph, rather than compressing it into frames, making it compatible with Graph Neural Networks (GNNs). (c) Spiking Neural Networks (SNNs) directly process asynchronous event data, where individual spiking neurons generate asynchronous spike outputs once the number of received spikes reaches a threshold. (d) Employing a multimodal fusion strategy, this approach capitalizes on the synergistic attributes of event data and RGB imagery. Feature extraction is initially performed separately on each modality using respective feature extractors, followed by a fusion process. The integrated features are then decoded to determine the bounding boxes and classify the objects.
Figure 5: Publication dates of selected papers on event-based object detection in autonomous driving scenarios.

Deep Event-based Object Detection in Autonomous Driving: A Survey

TL;DR

Abstract

Deep Event-based Object Detection in Autonomous Driving: A Survey

Authors

TL;DR

Abstract

Table of Contents

Figures (5)