MaFreeI2P: A Matching-Free Image-to-Point Cloud Registration Paradigm with Active Camera Pose Retrieval
Gongxin Yao, Xinyang Li, Yixin Xuan, Yu Pan
TL;DR
The paper tackles image-to-point cloud registration when modality gaps make 2D-3D matching brittle. It introduces MaFreeI2P, a matching-free approach that actively retrieves the camera pose in $SE(3)$ by sampling poses, constructing pose-based cost volumes from cross-modal embeddings, and guiding pose updates with a learned similarity function. Key innovations include a cross-modal pseudo-siamese backbone with circle loss, a pose-based cost-volume formulation, a confidence-weighted similarity estimator, and an iterative refinement loop with shrinking search spaces. Empirical results show state-of-the-art relative translation error and high recall on KITTI-Odometry, with competitive performance on Apollo-DaoxiangLake, demonstrating robustness and practical impact for cross-modal localization and mapping.
Abstract
Image-to-point cloud registration seeks to estimate their relative camera pose, which remains an open question due to the data modality gaps. The recent matching-based methods tend to tackle this by building 2D-3D correspondences. In this paper, we reveal the information loss inherent in these methods and propose a matching-free paradigm, named MaFreeI2P. Our key insight is to actively retrieve the camera pose in SE(3) space by contrasting the geometric features between the point cloud and the query image. To achieve this, we first sample a set of candidate camera poses and construct their cost volume using the cross-modal features. Superior to matching, cost volume can preserve more information and its feature similarity implicitly reflects the confidence level of the sampled poses. Afterwards, we employ a convolutional network to adaptively formulate a similarity assessment function, where the input cost volume is further improved by filtering and pose-based weighting. Finally, we update the camera pose based on the similarity scores, and adopt a heuristic strategy to iteratively shrink the pose sampling space for convergence. Our MaFreeI2P achieves a very competitive registration accuracy and recall on the KITTI-Odometry and Apollo-DaoxiangLake datasets.
