Influence of Camera-LiDAR Configuration on 3D Object Detection for Autonomous Driving
Ye Li, Hanjiang Hu, Zuxin Liu, Xiaohao Xu, Xiaonan Huang, Ding Zhao
TL;DR
Problem: camera-LiDAR configurations significantly influence fusion-based 3D object detection in autonomous driving. Approach: introduce a unified information-theoretic surrogate metric based on a ray-casting sensor perception model and probabilistic occupancy grids, plus an accelerated CARLA framework for data collection, training, and evaluation. Contributions: formal problem formulation; definitions of $S_{MIG|C_0}$ and $S_{MS}$; and extensive CARLA-based experiments showing configuration-induced detection differences up to 30% AP on nuScenes-like data. Significance: provides a practical tool to quickly evaluate and optimize sensor placements without costly real-world data collection or retraining, with potential extension to additional sensors.
Abstract
Cameras and LiDARs are both important sensors for autonomous driving, playing critical roles in 3D object detection. Camera-LiDAR Fusion has been a prevalent solution for robust and accurate driving perception. In contrast to the vast majority of existing arts that focus on how to improve the performance of 3D target detection through cross-modal schemes, deep learning algorithms, and training tricks, we devote attention to the impact of sensor configurations on the performance of learning-based methods. To achieve this, we propose a unified information-theoretic surrogate metric for camera and LiDAR evaluation based on the proposed sensor perception model. We also design an accelerated high-quality framework for data acquisition, model training, and performance evaluation that functions with the CARLA simulator. To show the correlation between detection performance and our surrogate metrics, We conduct experiments using several camera-LiDAR placements and parameters inspired by self-driving companies and research institutions. Extensive experimental results of representative algorithms on nuScenes dataset validate the effectiveness of our surrogate metric, demonstrating that sensor configurations significantly impact point-cloud-image fusion based detection models, which contribute up to 30% discrepancy in terms of the average precision.
