Table of Contents
Fetching ...

Adaptive Multi-step Refinement Network for Robust Point Cloud Registration

Zhi Chen, Yufan Ren, Tong Zhang, Zheng Dang, Wenbing Tao, Sabine Süsstrunk, Mathieu Salzmann

TL;DR

This work tackles robust point cloud registration under low overlap by introducing an adaptive multi-step refinement framework that leverages priors of varying accuracy. The method constructs priors through a degradation function that linearly interpolates between a GeoTransformer estimate and ground truth using $R_\tau$ and $t_\tau$, and trains step-specific networks conditioned on the refinement step index within a shared GeoTransformer-based backbone. A generalized one-way attention mechanism focuses on overlapping regions and integrates prior information from the previous step, enabling progressively better alignments. Experiments on 3DMatch/3DLoMatch and KITTI demonstrate state-of-the-art recall (e.g., $80.4\%$ RR on 3DLoMatch) and strong pose accuracy, validating the approach’s robustness to low overlap and its practical impact for real-world registration tasks.

Abstract

Point Cloud Registration (PCR) estimates the relative rigid transformation between two point clouds of the same scene. Despite significant progress with learning-based approaches, existing methods still face challenges when the overlapping region between the two point clouds is small. In this paper, we propose an adaptive multi-step refinement network that refines the registration quality at each step by leveraging the information from the preceding step. To achieve this, we introduce a training procedure and a refinement network. Firstly, to adapt the network to the current step, we utilize a generalized one-way attention mechanism, which prioritizes the last step's estimated overlapping region, and we condition the network on step indices. Secondly, instead of training the network to map either random transformations or a fixed pre-trained model's estimations to the ground truth, we train it on transformations with varying registration qualities, ranging from accurate to inaccurate, thereby enhancing the network's adaptiveness and robustness. Despite its conceptual simplicity, our method achieves state-of-the-art performance on both the 3DMatch/3DLoMatch and KITTI benchmarks. Notably, on 3DLoMatch, our method reaches 80.4% recall rate, with an absolute improvement of 1.2%.

Adaptive Multi-step Refinement Network for Robust Point Cloud Registration

TL;DR

This work tackles robust point cloud registration under low overlap by introducing an adaptive multi-step refinement framework that leverages priors of varying accuracy. The method constructs priors through a degradation function that linearly interpolates between a GeoTransformer estimate and ground truth using and , and trains step-specific networks conditioned on the refinement step index within a shared GeoTransformer-based backbone. A generalized one-way attention mechanism focuses on overlapping regions and integrates prior information from the previous step, enabling progressively better alignments. Experiments on 3DMatch/3DLoMatch and KITTI demonstrate state-of-the-art recall (e.g., RR on 3DLoMatch) and strong pose accuracy, validating the approach’s robustness to low overlap and its practical impact for real-world registration tasks.

Abstract

Point Cloud Registration (PCR) estimates the relative rigid transformation between two point clouds of the same scene. Despite significant progress with learning-based approaches, existing methods still face challenges when the overlapping region between the two point clouds is small. In this paper, we propose an adaptive multi-step refinement network that refines the registration quality at each step by leveraging the information from the preceding step. To achieve this, we introduce a training procedure and a refinement network. Firstly, to adapt the network to the current step, we utilize a generalized one-way attention mechanism, which prioritizes the last step's estimated overlapping region, and we condition the network on step indices. Secondly, instead of training the network to map either random transformations or a fixed pre-trained model's estimations to the ground truth, we train it on transformations with varying registration qualities, ranging from accurate to inaccurate, thereby enhancing the network's adaptiveness and robustness. Despite its conceptual simplicity, our method achieves state-of-the-art performance on both the 3DMatch/3DLoMatch and KITTI benchmarks. Notably, on 3DLoMatch, our method reaches 80.4% recall rate, with an absolute improvement of 1.2%.
Paper Structure (20 sections, 5 equations, 9 figures, 8 tables)

This paper contains 20 sections, 5 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Illustration of the proposed framework. (a) Matching visualization. Similar structures and shapes result in similar features, leading to the erroneous matches by the GeoTransformer, where one key point in the source point is matched to key points in the target point cloud outside the overlapping region, indicated by red lines. The ground truth match is shown as a green line. (b) Utilizing rough overlapping information, indicated as the shaded regions, PEAL successfully filters out many incorrect matches outside the overlap area, visualized similarly to (a). (c) Per-step aligned point cloud visualization. Our adaptive multi-step refinement approach employs multiple models, each progressively trained with more accurate priors as the model index increases. By adapting to the progressively increasing accuracy of priors during the refinement steps, our method surpasses PEAL in performance. RE and TE stand for rotation error ($\downarrow$) and translation error ($\downarrow$) (Best viewed on a screen when zoomed in).
  • Figure 2: Refinement Network at step index $k$. (1) For two overlapping point clouds $\mathcal{P}$ and $\mathcal{Q}$, we extract super points ($\mathcal{F}_\mathcal{P}^S$, $\mathcal{F}_\mathcal{Q}^S$) and fine points ($\mathcal{F}_\mathcal{P}^F$, $\mathcal{F}_\mathcal{Q}^F$) using KPConv. (2) We derive matchable superpoint-wise features using Transformers, where MHA stands for multi-head attention. Instead of standard self and cross-attention, we employ our generalized one-way attention, integrating the overlapping information from the previous step $\mathcal{X}_{k-1}$. (3) We compute the patch-wise correspondence map. (4) We propagate patch-wise correspondences to fine correspondences. (5) We obtain the final estimate through a robust estimator. (Best viewed on a screen when zoomed in)
  • Figure 3: 1D toy example. PEAL trains the network to map prior to ground truth. We propose to sample new prior transformations by interpolating between the GeoTransformer estimate and the ground truth. Upper row: Illustration of the point cloud alignment transitioning from the GeoTransformer estimate to the ground truth. (Best viewed on a screen when zoomed in)
  • Figure 4: One-way attention (taking the source point cloud $\mathcal{P}$ as an example). (a) The self-one-way-attention uses the feature of anchor points ($\mathcal{F}_\mathcal{P}^A$) as Key and Value in the attention operation to compute the feature of non-anchor points, reducing the impact of non-overlapping regions on the learned features ($\mathcal{F}_\mathcal{P}^N$). (b) The proposed cross-one-way-attention utilizes the anchor point features in the target point cloud ($\mathcal{F}_\mathcal{Q}^A$) as Key and Value, further considering interactions between the two point clouds in the one-way-attention.
  • Figure 5: Qualitative results of CoFiNet, GeoTransformer, PEAL, and our method compared with the ground-truth alignment. The overlapping areas are highlighted by the red boxes. (Best viewed on a screen when zoomed in)
  • ...and 4 more figures