MT-PCR: Leveraging Modality Transformation for Large-Scale Point Cloud Registration with Limited Overlap
Yilong Wu, Yifan Duan, Yuxi Chen, Xinran Zhang, Yedong Shen, Jianmin Ji, Yanyong Zhang, Lu Zhang
TL;DR
This paper tackles large-scale point cloud registration under limited overlap, a common challenge in TLS-ALS data fusion. It introduces MT-PCR, a modality-transformation approach that converts 3D point clouds into 2D BEV images to enable efficient 2D keypoint extraction and matching, then maps correspondences back to 3D to estimate the rigid transform via SVD with ICP refinement. Key contributions include density-aware BEV processing, BEV image enhancement, a FOCUS module for low-overlap cases, and a robust 3D pose estimation pipeline, validated on the GrAco TLS-ALS dataset where MT-PCR achieves low rotational and translational errors and high SRR across multiple combinations. The method reduces computational load while improving robustness and generalization in large-scale, cross-modal registration, with potential for integration into multi-robot and multimodal fusion systems.
Abstract
Large-scale scene point cloud registration with limited overlap is a challenging task due to computational load and constrained data acquisition. To tackle these issues, we propose a point cloud registration method, MT-PCR, based on Modality Transformation. MT-PCR leverages a BEV capturing the maximal overlap information to improve the accuracy and utilizes images to provide complementary spatial features. Specifically, MT-PCR converts 3D point clouds to BEV images and eastimates correspondence by 2D image keypoints extraction and matching. Subsequently, the 2D correspondence estimates are then transformed back to 3D point clouds using inverse mapping. We have applied MT-PCR to Terrestrial Laser Scanning and Aerial Laser Scanning point cloud registration on the GrAco dataset, involving 8 low-overlap, square-kilometer scale registration scenarios. Experiments and comparisons with commonly used methods demonstrate that MT-PCR can achieve superior accuracy and robustness in large-scale scenes with limited overlap.
