Table of Contents
Fetching ...

Deep-PE: A Learning-Based Pose Evaluator for Point Cloud Registration

Junjie Gao, Chongjian Wang, Zhongjun Ding, Shuangmin Chen, Shiqing Xin, Changhe Tu, Wenping Wang

TL;DR

Deep-PE addresses the challenge of pose evaluation in point cloud registration under low overlap by learning a pose-specific confidence for each candidate transformation. It introduces a lightweight architecture with a Pose-Aware Attention module to simulate alignment status under different poses and a Pose Confidence Prediction module to output per-pose confidence, trained with a weighted cross-entropy loss that emphasizes poses close to the ground truth. By leveraging a pre-trained Geotransformer feature extractor and efficient attention-based learning, Deep-PE achieves state-of-the-art registration recall on benchmarks like 3DLoMatch across handcrafted FPFH and learning-based FC GF descriptors, while remaining robust to low inlier ratios and capable of identifying registration failures. The approach reduces dependence on input correspondences, improves robustness in challenging scenarios, and can be integrated into existing estimator-based pipelines to enhance pose selection without regressing transformations. Overall, Deep-PE demonstrates that learning-based pose evaluation can surpass traditional statistics-based evaluators in difficult registration settings, with practical implications for robotics and 3D perception systems.

Abstract

In the realm of point cloud registration, the most prevalent pose evaluation approaches are statistics-based, identifying the optimal transformation by maximizing the number of consistent correspondences. However, registration recall decreases significantly when point clouds exhibit a low overlap rate, despite efforts in designing feature descriptors and establishing correspondences. In this paper, we introduce Deep-PE, a lightweight, learning-based pose evaluator designed to enhance the accuracy of pose selection, especially in challenging point cloud scenarios with low overlap. Our network incorporates a Pose-Aware Attention (PAA) module to simulate and learn the alignment status of point clouds under various candidate poses, alongside a Pose Confidence Prediction (PCP) module that predicts the likelihood of successful registration. These two modules facilitate the learning of both local and global alignment priors. Extensive tests across multiple benchmarks confirm the effectiveness of Deep-PE. Notably, on 3DLoMatch with a low overlap rate, Deep-PE significantly outperforms state-of-the-art methods by at least 8% and 11% in registration recall under handcrafted FPFH and learning-based FCGF descriptors, respectively. To the best of our knowledge, this is the first study to utilize deep learning to select the optimal pose without the explicit need for input correspondences.

Deep-PE: A Learning-Based Pose Evaluator for Point Cloud Registration

TL;DR

Deep-PE addresses the challenge of pose evaluation in point cloud registration under low overlap by learning a pose-specific confidence for each candidate transformation. It introduces a lightweight architecture with a Pose-Aware Attention module to simulate alignment status under different poses and a Pose Confidence Prediction module to output per-pose confidence, trained with a weighted cross-entropy loss that emphasizes poses close to the ground truth. By leveraging a pre-trained Geotransformer feature extractor and efficient attention-based learning, Deep-PE achieves state-of-the-art registration recall on benchmarks like 3DLoMatch across handcrafted FPFH and learning-based FC GF descriptors, while remaining robust to low inlier ratios and capable of identifying registration failures. The approach reduces dependence on input correspondences, improves robustness in challenging scenarios, and can be integrated into existing estimator-based pipelines to enhance pose selection without regressing transformations. Overall, Deep-PE demonstrates that learning-based pose evaluation can surpass traditional statistics-based evaluators in difficult registration settings, with practical implications for robotics and 3D perception systems.

Abstract

In the realm of point cloud registration, the most prevalent pose evaluation approaches are statistics-based, identifying the optimal transformation by maximizing the number of consistent correspondences. However, registration recall decreases significantly when point clouds exhibit a low overlap rate, despite efforts in designing feature descriptors and establishing correspondences. In this paper, we introduce Deep-PE, a lightweight, learning-based pose evaluator designed to enhance the accuracy of pose selection, especially in challenging point cloud scenarios with low overlap. Our network incorporates a Pose-Aware Attention (PAA) module to simulate and learn the alignment status of point clouds under various candidate poses, alongside a Pose Confidence Prediction (PCP) module that predicts the likelihood of successful registration. These two modules facilitate the learning of both local and global alignment priors. Extensive tests across multiple benchmarks confirm the effectiveness of Deep-PE. Notably, on 3DLoMatch with a low overlap rate, Deep-PE significantly outperforms state-of-the-art methods by at least 8% and 11% in registration recall under handcrafted FPFH and learning-based FCGF descriptors, respectively. To the best of our knowledge, this is the first study to utilize deep learning to select the optimal pose without the explicit need for input correspondences.
Paper Structure (35 sections, 9 equations, 14 figures, 9 tables)

This paper contains 35 sections, 9 equations, 14 figures, 9 tables.

Figures (14)

  • Figure 1: (a) Even in the well-known registration pipeline Qin2022GeometricTF, the Inlier Ratio (IR) and Registration Recall (RR) deteriorates rapidly for point cloud pairs with an overlap of <30%. (b) In low-overlap 3DLoMatch benchmark, compared to the statistics-based pose evaluators, CC Fischler1981RandomSC and FS-TCD chen2023sc, Deep-PE demonstrates a significant advantage in cases of low inlier ratios and also approaches the ground-truth value as the inlier ratio increases.
  • Figure 2: Illustration of our insight: Incorrect poses frequently align areas in the point clouds that should not be aligned; points within these regions typically display larger feature residuals. The fourth column depicts the overlapping regions predicted under the ground truth pose, while the fifth column illustrates the severity of feature residuals, with red colors denoting larger residuals. RRE: Relative Rotation Error. RTE: Relative Translation Error.
  • Figure 3: Deep-PE mainly consists of three components: 1. The pre-trained feature extractor downsamples the input point clouds and learns features in multiple resolution levels. The points and features of the coarsest and penultimate layers are cached and reused for each candidate pose $\mathcal{H}_i$. 2. The pose-aware attention module adjusts attention regions based on different candidate poses and embeds relevant features into coarse-level features. 3. The pose confidence prediction module first calculates the feature residuals before and after coarse-level feature updates for each point cloud, then combines them through concatenation and max-pooling operations. Finally, a Multilayer Perceptron (MLP) layer is employed to predict the confidence score $\mathcal{S}_i$ associated with each candidate pose $\mathcal{H}_i$. Then, the pose with the maximum confidence score is selected as the final transformation.
  • Figure 4: Illustration of attention regions and scores under correct and incorrect poses. It's evident that, with the correct pose, attention scores are higher in regions resembling the query point, leading to smaller feature residuals before and after updating. Conversely, the opposite holds true under an incorrect pose.
  • Figure 5: Comparison of registration results between Deep-PE and statistics-based pose evaluators.
  • ...and 9 more figures