LCV2I: Communication-Efficient and High-Performance Collaborative Perception Framework with Low-Resolution LiDAR
Xinxin Feng, Haoran Sun, Haifeng Zheng
TL;DR
The paper tackles the cost barrier of collaborative perception by enabling high-performance 3D object detection using low-resolution vehicle LiDAR. It introduces LCV2I, a multi-modal framework that fuses camera features with sparse LiDAR via Voxel-Wise Fusion, corrects cross-modal feature misalignment with Feature Offset Correction, and strengthens regional features using Regional Feature Enhancement, all guided by a region-aware communication strategy. The approach achieves higher detection accuracy than state-of-the-art methods across varying LiDAR resolutions while reducing necessary bandwidth, as demonstrated on the DAIR-V2X dataset. This work provides a practical pathway to deploy V2I collaborative perception in cost-sensitive settings without sacrificing performance.
Abstract
Vehicle-to-Infrastructure (V2I) collaborative perception leverages data collected by infrastructure's sensors to enhance vehicle perceptual capabilities. LiDAR, as a commonly used sensor in cooperative perception, is widely equipped in intelligent vehicles and infrastructure. However, its superior performance comes with a correspondingly high cost. To achieve low-cost V2I, reducing the cost of LiDAR is crucial. Therefore, we study adopting low-resolution LiDAR on the vehicle to minimize cost as much as possible. However, simply reducing the resolution of vehicle's LiDAR results in sparse point clouds, making distant small objects even more blurred. Additionally, traditional communication methods have relatively low bandwidth utilization efficiency. These factors pose challenges for us. To balance cost and perceptual accuracy, we propose a new collaborative perception framework, namely LCV2I. LCV2I uses data collected from cameras and low-resolution LiDAR as input. It also employs feature offset correction modules and regional feature enhancement algorithms to improve feature representation. Finally, we use regional difference map and regional score map to assess the value of collaboration content, thereby improving communication bandwidth efficiency. In summary, our approach achieves high perceptual performance while substantially reducing the demand for high-resolution sensors on the vehicle. To evaluate this algorithm, we conduct 3D object detection in the real-world scenario of DAIR-V2X, demonstrating that the performance of LCV2I consistently surpasses currently existing algorithms.
