Table of Contents
Fetching ...

Unlocking Generalization Power in LiDAR Point Cloud Registration

Zhenxuan Zeng, Qiao Wu, Xiyu Zhang, Lin Yuanbo Wu, Pei An, Jiaqi Yang, Ji Wang, Peng Wang

TL;DR

The paper tackles the generalization gap in LiDAR point cloud registration across cross-distance and cross-dataset scenarios. It proposes UGP, a pruned framework that eliminates cross-attention, adds a progressive self-attention module, and fuses Bird's Eye View semantics to strengthen intra-frame features and reduce scene ambiguity. Empirical results on KITTI and nuScenes demonstrate state-of-the-art mean Registration Recall across distances and strong cross-dataset performance, with robustness to sparsity and noise. The approach offers a practical, generalizable solution for robust LiDAR registration with potential safety benefits for autonomous driving systems.

Abstract

In real-world environments, a LiDAR point cloud registration method with robust generalization capabilities (across varying distances and datasets) is crucial for ensuring safety in autonomous driving and other LiDAR-based applications. However, current methods fall short in achieving this level of generalization. To address these limitations, we propose UGP, a pruned framework designed to enhance generalization power for LiDAR point cloud registration. The core insight in UGP is the elimination of cross-attention mechanisms to improve generalization, allowing the network to concentrate on intra-frame feature extraction. Additionally, we introduce a progressive self-attention module to reduce ambiguity in large-scale scenes and integrate Bird's Eye View (BEV) features to incorporate semantic information about scene elements. Together, these enhancements significantly boost the network's generalization performance. We validated our approach through various generalization experiments in multiple outdoor scenes. In cross-distance generalization experiments on KITTI and nuScenes, UGP achieved state-of-the-art mean Registration Recall rates of 94.5% and 91.4%, respectively. In cross-dataset generalization from nuScenes to KITTI, UGP achieved a state-of-the-art mean Registration Recall of 90.9%. Code will be available at https://github.com/peakpang/UGP.

Unlocking Generalization Power in LiDAR Point Cloud Registration

TL;DR

The paper tackles the generalization gap in LiDAR point cloud registration across cross-distance and cross-dataset scenarios. It proposes UGP, a pruned framework that eliminates cross-attention, adds a progressive self-attention module, and fuses Bird's Eye View semantics to strengthen intra-frame features and reduce scene ambiguity. Empirical results on KITTI and nuScenes demonstrate state-of-the-art mean Registration Recall across distances and strong cross-dataset performance, with robustness to sparsity and noise. The approach offers a practical, generalizable solution for robust LiDAR registration with potential safety benefits for autonomous driving systems.

Abstract

In real-world environments, a LiDAR point cloud registration method with robust generalization capabilities (across varying distances and datasets) is crucial for ensuring safety in autonomous driving and other LiDAR-based applications. However, current methods fall short in achieving this level of generalization. To address these limitations, we propose UGP, a pruned framework designed to enhance generalization power for LiDAR point cloud registration. The core insight in UGP is the elimination of cross-attention mechanisms to improve generalization, allowing the network to concentrate on intra-frame feature extraction. Additionally, we introduce a progressive self-attention module to reduce ambiguity in large-scale scenes and integrate Bird's Eye View (BEV) features to incorporate semantic information about scene elements. Together, these enhancements significantly boost the network's generalization performance. We validated our approach through various generalization experiments in multiple outdoor scenes. In cross-distance generalization experiments on KITTI and nuScenes, UGP achieved state-of-the-art mean Registration Recall rates of 94.5% and 91.4%, respectively. In cross-dataset generalization from nuScenes to KITTI, UGP achieved a state-of-the-art mean Registration Recall of 90.9%. Code will be available at https://github.com/peakpang/UGP.

Paper Structure

This paper contains 22 sections, 11 equations, 10 figures, 10 tables.

Figures (10)

  • Figure 1: Illustration of the generalization performance of leading methods yu2021cofinetqin2022geometricyao2024pare in cross-distance and cross-dataset. (a) Cross-distance generalization: train on KITTI@10m, test on KITTI@40m. (b) Cross-dataset generalization: train on nuScenes@10m nuscenes, test on KITTI@10m. These methods experience substantial performance degradation due to poor generalization. In contrast, our method achieves the best registration results on both cross-distance and cross-dataset.
  • Figure 2: Motivation. We conducted a statistical analysis of the matching characteristics in LiDAR scenes and identified the limitations of existing methods. This was followed by preliminary visualizations and elimination experiments to validate our findings. (a) Each point in the figure represents a ground truth corresponding superpoint pair. The position of each point indicates the neighborhood count (NC) of the superpoint within a radius of $r=2.4m$ in both the source (src) and target (tgt) point clouds, and the color represents the overlap degree of the corresponding superpoint pairs after rotation by the ground truth transformation. (b) Performance of existing methods in cross-distance (upper left) and cross-dataset (upper right). (c) GeoTrans qin2022geometric cannot match superpoints (cross-distance, from KITTI@10m to KITTI@20m). (d) The performance of existing methods after eliminating cross-attention, and UGP (ours).
  • Figure 3: Overview of UGP. The input point cloud is first projected to obtain the corresponding BEV image. Then, the point cloud and BEV image are fed into the Point Encoder and BEV encoder, respectively, for downsampling and feature extraction. In the superpoint matching stage, the indexing relationship between superpoints and BEV is used to fuse point-level features and image features. The fused features are used as input to the progressive self-attention module to extract robust and consistent intra-frame features. Finally, the superpoint matching results are propagated to dense points for dense matching, enabling the recovery of the rigid transformation.
  • Figure 4: Illustration of our proposed progressive self-attention. In the initial layer, the triangular points prioritize attention to the local space around them, and the attention range is gradually expanded in subsequent layers
  • Figure 5: Cross-distance registration recall results of different RRE and RTE thresholds on the KITTI@30m and nuScenes@30m.
  • ...and 5 more figures