Table of Contents
Fetching ...

MT-PCR: Leveraging Modality Transformation for Large-Scale Point Cloud Registration with Limited Overlap

Yilong Wu, Yifan Duan, Yuxi Chen, Xinran Zhang, Yedong Shen, Jianmin Ji, Yanyong Zhang, Lu Zhang

TL;DR

This paper tackles large-scale point cloud registration under limited overlap, a common challenge in TLS-ALS data fusion. It introduces MT-PCR, a modality-transformation approach that converts 3D point clouds into 2D BEV images to enable efficient 2D keypoint extraction and matching, then maps correspondences back to 3D to estimate the rigid transform via SVD with ICP refinement. Key contributions include density-aware BEV processing, BEV image enhancement, a FOCUS module for low-overlap cases, and a robust 3D pose estimation pipeline, validated on the GrAco TLS-ALS dataset where MT-PCR achieves low rotational and translational errors and high SRR across multiple combinations. The method reduces computational load while improving robustness and generalization in large-scale, cross-modal registration, with potential for integration into multi-robot and multimodal fusion systems.

Abstract

Large-scale scene point cloud registration with limited overlap is a challenging task due to computational load and constrained data acquisition. To tackle these issues, we propose a point cloud registration method, MT-PCR, based on Modality Transformation. MT-PCR leverages a BEV capturing the maximal overlap information to improve the accuracy and utilizes images to provide complementary spatial features. Specifically, MT-PCR converts 3D point clouds to BEV images and eastimates correspondence by 2D image keypoints extraction and matching. Subsequently, the 2D correspondence estimates are then transformed back to 3D point clouds using inverse mapping. We have applied MT-PCR to Terrestrial Laser Scanning and Aerial Laser Scanning point cloud registration on the GrAco dataset, involving 8 low-overlap, square-kilometer scale registration scenarios. Experiments and comparisons with commonly used methods demonstrate that MT-PCR can achieve superior accuracy and robustness in large-scale scenes with limited overlap.

MT-PCR: Leveraging Modality Transformation for Large-Scale Point Cloud Registration with Limited Overlap

TL;DR

This paper tackles large-scale point cloud registration under limited overlap, a common challenge in TLS-ALS data fusion. It introduces MT-PCR, a modality-transformation approach that converts 3D point clouds into 2D BEV images to enable efficient 2D keypoint extraction and matching, then maps correspondences back to 3D to estimate the rigid transform via SVD with ICP refinement. Key contributions include density-aware BEV processing, BEV image enhancement, a FOCUS module for low-overlap cases, and a robust 3D pose estimation pipeline, validated on the GrAco TLS-ALS dataset where MT-PCR achieves low rotational and translational errors and high SRR across multiple combinations. The method reduces computational load while improving robustness and generalization in large-scale, cross-modal registration, with potential for integration into multi-robot and multimodal fusion systems.

Abstract

Large-scale scene point cloud registration with limited overlap is a challenging task due to computational load and constrained data acquisition. To tackle these issues, we propose a point cloud registration method, MT-PCR, based on Modality Transformation. MT-PCR leverages a BEV capturing the maximal overlap information to improve the accuracy and utilizes images to provide complementary spatial features. Specifically, MT-PCR converts 3D point clouds to BEV images and eastimates correspondence by 2D image keypoints extraction and matching. Subsequently, the 2D correspondence estimates are then transformed back to 3D point clouds using inverse mapping. We have applied MT-PCR to Terrestrial Laser Scanning and Aerial Laser Scanning point cloud registration on the GrAco dataset, involving 8 low-overlap, square-kilometer scale registration scenarios. Experiments and comparisons with commonly used methods demonstrate that MT-PCR can achieve superior accuracy and robustness in large-scale scenes with limited overlap.

Paper Structure

This paper contains 14 sections, 7 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: We introduce MT-PCR, a method for large-scale scene point cloud registration with limited overlap, particularly for TLS-ALS integration. Based on Modality Transformation, MT-PCR involves converting 3D point clouds into 2D BEV images, facilitating correspondence estimation through 2D image keypoints extraction and matching.
  • Figure 2: Overview of the MT-PCR pipeline. The point cloud $\mathcal{P}_{\text{source}},\mathcal{P}_{\text{target}}$ obtained from ALS and TLS is transformed to align with the XOY plane and their resolution is enhanced based on density. Height information is converted to grayscale and further enhanced to emphasize details, resulting in BEV images. Afterward, keypoint extraction and matching are performed using SuperPointdetone2018superpoint and LightGluelindenberger2023lightglue, which are repeated in the FOCUS module within overlap regions. 3D correspondences between $\mathcal{P}_{\text{source}}$ and $\mathcal{P}_{\text{target}}$ are established through inverse mapping of 2D correspondences from the BEV image. Finally, the transformation matrix is computed using the SVD algorithm from the refined 3D correspondences.
  • Figure 3: The comparison of before and after using the FOCUS module. By performing 2D keypoint extraction and matching again on the neighborhood of matched points, the FOCUS module increases the number of matched points and enhances the accuracy of correspondence estimation.
  • Figure 4: Registration results of the MT-PCR, FPFH, and GeoTransformer on the A03-G03 and A04-G06 combinations. Yellow and blue represent point clouds obtained from TLS and ALS, respectively.
  • Figure 5: Effects of using different modules in the ablation experiments: (a) BEV image of G03 without image enhancement is located at the top, and the enhanced image is at the bottom.; (b) and (c) are parts with and without resolution augmentation in the red dashed box in (a), respectively