Towards Consistent Object Detection via LiDAR-Camera Synergy
Kai Luo, Hao Wu, Kefu Yi, Kailun Yang, Wei Hao, Rongdong Hu
TL;DR
The paper addresses the challenge of achieving consistent cross-modal object detection across LiDAR and camera data. It introduces an end-to-end Consistency Object Detection (COD) framework that uses LiDAR proposals to initialize image queries, enabling simultaneous 3D and 2D detections with the same object identity in one forward pass, and it proposes the Consistency Precision (CP) metric to quantify cross-modal correspondence. The method combines a configurable LiDAR detector with an RT-DETR image detector, employing learnable query initialization, Hungarian-based matching, and a joint training loss, and it demonstrates robustness to calibration inaccuracies on KITTI and DAIR-V2X benchmarks. The work provides new benchmarks and demonstrates strong cross-modal consistency, offering a practical approach for robust multimodal perception in driving scenes and potential human-machine interaction applications.
Abstract
As human-machine interaction continues to evolve, the capacity for environmental perception is becoming increasingly crucial. Integrating the two most common types of sensory data, images, and point clouds, can enhance detection accuracy. Currently, there is no existing model capable of detecting an object's position in both point clouds and images while also determining their corresponding relationship. This information is invaluable for human-machine interactions, offering new possibilities for their enhancement. In light of this, this paper introduces an end-to-end Consistency Object Detection (COD) algorithm framework that requires only a single forward inference to simultaneously obtain an object's position in both point clouds and images and establish their correlation. Furthermore, to assess the accuracy of the object correlation between point clouds and images, this paper proposes a new evaluation metric, Consistency Precision (CP). To verify the effectiveness of the proposed framework, an extensive set of experiments has been conducted on the KITTI and DAIR-V2X datasets. The study also explored how the proposed consistency detection method performs on images when the calibration parameters between images and point clouds are disturbed, compared to existing post-processing methods. The experimental results demonstrate that the proposed method exhibits excellent detection performance and robustness, achieving end-to-end consistency detection. The source code will be made publicly available at https://github.com/xifen523/COD.
