3D Object Detection for Autonomous Driving: A Survey
Rui Qian, Xin Lai, Xirong Li
TL;DR
This survey addresses the problem of 3D object detection for autonomous driving, examining how images, LiDAR, and their fusion can robustly infer oriented 3D bounding boxes and headings. It introduces a modality-based taxonomy (image-based, point-cloud-based, and multimodal fusion), differentiates fusion paradigms (sequential vs parallel), and provides a comprehensive review of methods across voxel-based, point-based, and hybrid approaches, complemented by a 15-model case study with runtime, error, and robustness analyses. The work highlights that LiDAR-driven voxel/point methods currently offer strongest accuracy and efficiency, while multimodal fusion offers robustness but requires careful alignment. It also points to future needs in uncertainty-aware perception, end-to-end depth learning, and shape-driven representations to advance safe, reliable autonomous driving systems.
Abstract
Autonomous driving is regarded as one of the most promising remedies to shield human beings from severe crashes. To this end, 3D object detection serves as the core basis of perception stack especially for the sake of path planning, motion prediction, and collision avoidance etc. Taking a quick glance at the progress we have made, we attribute challenges to visual appearance recovery in the absence of depth information from images, representation learning from partially occluded unstructured point clouds, and semantic alignments over heterogeneous features from cross modalities. Despite existing efforts, 3D object detection for autonomous driving is still in its infancy. Recently, a large body of literature have been investigated to address this 3D vision task. Nevertheless, few investigations have looked into collecting and structuring this growing knowledge. We therefore aim to fill this gap in a comprehensive survey, encompassing all the main concerns including sensors, datasets, performance metrics and the recent state-of-the-art detection methods, together with their pros and cons. Furthermore, we provide quantitative comparisons with the state of the art. A case study on fifteen selected representative methods is presented, involved with runtime analysis, error analysis, and robustness analysis. Finally, we provide concluding remarks after an in-depth analysis of the surveyed works and identify promising directions for future work.
