CoFiI2P: Coarse-to-Fine Correspondences for Image-to-Point Cloud Registration

Shuhao Kang; Youqi Liao; Jianping Li; Fuxun Liang; Yuhao Li; Xianghong Zou; Fangning Li; Xieyuanli Chen; Zhen Dong; Bisheng Yang

CoFiI2P: Coarse-to-Fine Correspondences for Image-to-Point Cloud Registration

Shuhao Kang, Youqi Liao, Jianping Li, Fuxun Liang, Yuhao Li, Xianghong Zou, Fangning Li, Xieyuanli Chen, Zhen Dong, Bisheng Yang

TL;DR

CoFiI2P is introduced, a novel I2P registration network that extracts correspondences in a coarse-to-fine manner and achieves impressive results, with a significant improvement of 84% in RRE and 89% in RTE compared to the current state-of-the-art (SOTA) method.

Abstract

Image-to-point cloud (I2P) registration is a fundamental task for robots and autonomous vehicles to achieve cross-modality data fusion and localization. Current I2P registration methods primarily focus on estimating correspondences at the point or pixel level, often neglecting global alignment. As a result, I2P matching can easily converge to a local optimum if it lacks high-level guidance from global constraints. To improve the success rate and general robustness, this paper introduces CoFiI2P, a novel I2P registration network that extracts correspondences in a coarse-to-fine manner. First, the image and point cloud data are processed through a two-stream encoder-decoder network for hierarchical feature extraction. Second, a coarse-to-fine matching module is designed to leverage these features and establish robust feature correspondences. Specifically, In the coarse matching phase, a novel I2P transformer module is employed to capture both homogeneous and heterogeneous global information from the image and point cloud data. This enables the estimation of coarse super-point/super-pixel matching pairs with discriminative descriptors. In the fine matching module, point/pixel pairs are established with the guidance of super-point/super-pixel correspondences. Finally, based on matching pairs, the transform matrix is estimated with the EPnP-RANSAC algorithm. Experiments conducted on the KITTI Odometry dataset demonstrate that CoFiI2P achieves impressive results, with a relative rotation error (RRE) of 1.14 degrees and a relative translation error (RTE) of 0.29 meters, while maintaining real-time speed.Additional experiments on the Nuscenes datasets confirm our method's generalizability. The project page is available at \url{https://whu-usi3dv.github.io/CoFiI2P}.

CoFiI2P: Coarse-to-Fine Correspondences for Image-to-Point Cloud Registration

TL;DR

Abstract

Paper Structure (28 sections, 18 equations, 4 figures, 2 tables)

This paper contains 28 sections, 18 equations, 4 figures, 2 tables.

Introduction
Related work
Same-modality Registration
I2I Registration
P2P Registration
Cross-modality Registration
Fine Registration Methods
Coarse Registration Methods
Methodology
Feature Extraction
I2P Coarse Matching
I2P Transformer
Super-point/-pixel Matching
I2P Fine Matching
EPnP-RANSAC based Pose Estimation
...and 13 more sections

Figures (4)

Figure 1: Comparison of existing one-stage I2P registration and proposed coarse-to-fine I2P registration. (a) The existing one-stage registration pipeline. The matching pairs are directly established at the point/pixel level, leading to a significant number of mismatches. (b) Our coarse-to-fine matching pipeline. Under the guidance of super point-to-pixel pairs, point-to-pixel pairs are generated from the existing super pairs, which effectively eliminates most mismatches.
Figure 2: Workflow of CoFiI2P. The proposed method consists of feature extraction, coarse matching, fine matching and pose estimation modules. Image and point cloud are sent to the feature extraction module to obtain coarse-level features and fine-level features (rendered in red for image and green for point cloud, respectively). The coarse-level features are strengthened by I2P transformer module and then matched with the cosine similarity. Fine features are gathered from the last layer of the decoder. In each super-point/super-pixel pair, the node point is set as the candidate and the corresponding pixel is selected from the super-pixel area, a $w \times w$ window. The generated fine-level matching pairs are utilized to estimate the pose with the EPnP-RANSAC lepetit2009epfischler1981random algorithm.
Figure 3: Quantitative registration results on the KITTI Odometry dataset. The colors are rendered based on depth, ranging from blue in the foreground to red in the distance.
Figure 4: Quantitative results of correspondences. (a) and (b) shows the inlier ratio of our method (blue line) and CorrI2P (orange line) on the KITTI Odometry and Nuscenes dataset respectively.

CoFiI2P: Coarse-to-Fine Correspondences for Image-to-Point Cloud Registration

TL;DR

Abstract

CoFiI2P: Coarse-to-Fine Correspondences for Image-to-Point Cloud Registration

Authors

TL;DR

Abstract

Table of Contents

Figures (4)