Quantity-Aware Coarse-to-Fine Correspondence for Image-to-Point Cloud Registration
Gongxin Yao, Yixin Xuan, Yiwei Chen, Yu Pan
TL;DR
The paper addresses image-to-point cloud registration by proposing a quantity-aware coarse-to-fine framework (CFI2P) that learns soft set-to-patch correlations and refines them to point-to-pixel correspondences. It models cross-modal correlation as an optimal transport problem with continuous supervision based on bilateral point-proportions, and uses a hybrid transformer architecture with a confidence-sorting mechanism to progressively improve correspondences. Coarse matching establishes initial set-to-patch mappings, which are then refined through resampling, attention-based learning, and masked optimal transport at the fine level, culminating in efficient RANSAC-based PnP pose estimation. Empirical results on KITTI Odometry and NuScenes show state-of-the-art performance with high inlier ratios, robust to density and resolution gaps, underscoring its practical value for multi-modal perception in robotics and autonomous systems.
Abstract
Image-to-point cloud registration aims to determine the relative camera pose between an RGB image and a reference point cloud, serving as a general solution for locating 3D objects from 2D observations. Matching individual points with pixels can be inherently ambiguous due to modality gaps. To address this challenge, we propose a framework to capture quantity-aware correspondences between local point sets and pixel patches and refine the results at both the point and pixel levels. This framework aligns the high-level semantics of point sets and pixel patches to improve the matching accuracy. On a coarse scale, the set-to-patch correspondence is expected to be influenced by the quantity of 3D points. To achieve this, a novel supervision strategy is proposed to adaptively quantify the degrees of correlation as continuous values. On a finer scale, point-to-pixel correspondences are refined from a smaller search space through a well-designed scheme, which incorporates both resampling and quantity-aware priors. Particularly, a confidence sorting strategy is proposed to proportionally select better correspondences at the final stage. Leveraging the advantages of high-quality correspondences, the problem is successfully resolved using an efficient Perspective-n-Point solver within the framework of random sample consensus (RANSAC). Extensive experiments on the KITTI Odometry and NuScenes datasets demonstrate the superiority of our method over the state-of-the-art methods.
