Explore the LiDAR-Camera Dynamic Adjustment Fusion for 3D Object Detection
Yiran Yang, Xu Gao, Tong Wang, Xin Hao, Yifeng Shi, Xiao Tan, Xiaoqing Ye, Jingdong Wang
TL;DR
This work tackles the modality gap in LiDAR-camera fusion for 3D object detection by introducing a dynamic adjustment fusion framework. It combines a triphase domain aligning module to co-align camera and LiDAR features with ground truth, a modal interaction and specialty enhancement module to enrich cross-modal representations, a dynamic fusion mechanism to fuse features in space and channel domains, and an adaptive learning technique to optimize diverse instances using semantic and geometric cues. Extensive nuScenes experiments show competitive performance against state-of-the-art methods, with ablations validating the contribution of each component. The approach advances robust multi-modal fusion by learning aligned, highly informative representations prior to fusion and by prioritizing perceptual quality across instances, promising practical impact for autonomous driving perception systems.
Abstract
Camera and LiDAR serve as informative sensors for accurate and robust autonomous driving systems. However, these sensors often exhibit heterogeneous natures, resulting in distributional modality gaps that present significant challenges for fusion. To address this, a robust fusion technique is crucial, particularly for enhancing 3D object detection. In this paper, we introduce a dynamic adjustment technology aimed at aligning modal distributions and learning effective modality representations to enhance the fusion process. Specifically, we propose a triphase domain aligning module. This module adjusts the feature distributions from both the camera and LiDAR, bringing them closer to the ground truth domain and minimizing differences. Additionally, we explore improved representation acquisition methods for dynamic fusion, which includes modal interaction and specialty enhancement. Finally, an adaptive learning technique that merges the semantics and geometry information for dynamical instance optimization. Extensive experiments in the nuScenes dataset present competitive performance with state-of-the-art approaches. Our code will be released in the future.
