Table of Contents
Fetching ...

BUFFER-X: Towards Zero-Shot Point Cloud Registration in Diverse Scenes

Minkyun Seo, Hyungtae Lim, Kanghee Lee, Luca Carlone, Jaesik Park

TL;DR

BUFFER-X tackles zero-shot point cloud registration by diagnosing three generalization bottlenecks: dependence on environment-specific voxel sizes and search radii, brittle out-of-domain keypoint detectors, and unnormalized coordinates. It introduces a detector-free, multi-scale patch-based descriptor pipeline with geometric bootstrapping to adapt voxel sizes, density-aware radii, and PCA-based reference axes, along with a hierarchical cross-scale inlier search for robust pose estimation. A comprehensive 11-dataset benchmark demonstrates strong zero-shot generalization without tuning, underscoring practical deployment potential across diverse sensors and environments. The work also provides detailed parameter guidance and code, enabling reproducible evaluation and adoption in real-world robotics and perception tasks.

Abstract

Recent advances in deep learning-based point cloud registration have improved generalization, yet most methods still require retraining or manual parameter tuning for each new environment. In this paper, we identify three key factors limiting generalization: (a) reliance on environment-specific voxel size and search radius, (b) poor out-of-domain robustness of learning-based keypoint detectors, and (c) raw coordinate usage, which exacerbates scale discrepancies. To address these issues, we present a zero-shot registration pipeline called BUFFER-X by (a) adaptively determining voxel size/search radii, (b) using farthest point sampling to bypass learned detectors, and (c) leveraging patch-wise scale normalization for consistent coordinate bounds. In particular, we present a multi-scale patch-based descriptor generation and a hierarchical inlier search across scales to improve robustness in diverse scenes. We also propose a novel generalizability benchmark using 11 datasets that cover various indoor/outdoor scenarios and sensor modalities, demonstrating that BUFFER-X achieves substantial generalization without prior information or manual parameter tuning for the test datasets. Our code is available at https://github.com/MIT-SPARK/BUFFER-X.

BUFFER-X: Towards Zero-Shot Point Cloud Registration in Diverse Scenes

TL;DR

BUFFER-X tackles zero-shot point cloud registration by diagnosing three generalization bottlenecks: dependence on environment-specific voxel sizes and search radii, brittle out-of-domain keypoint detectors, and unnormalized coordinates. It introduces a detector-free, multi-scale patch-based descriptor pipeline with geometric bootstrapping to adapt voxel sizes, density-aware radii, and PCA-based reference axes, along with a hierarchical cross-scale inlier search for robust pose estimation. A comprehensive 11-dataset benchmark demonstrates strong zero-shot generalization without tuning, underscoring practical deployment potential across diverse sensors and environments. The work also provides detailed parameter guidance and code, enabling reproducible evaluation and adoption in real-world robotics and perception tasks.

Abstract

Recent advances in deep learning-based point cloud registration have improved generalization, yet most methods still require retraining or manual parameter tuning for each new environment. In this paper, we identify three key factors limiting generalization: (a) reliance on environment-specific voxel size and search radius, (b) poor out-of-domain robustness of learning-based keypoint detectors, and (c) raw coordinate usage, which exacerbates scale discrepancies. To address these issues, we present a zero-shot registration pipeline called BUFFER-X by (a) adaptively determining voxel size/search radii, (b) using farthest point sampling to bypass learned detectors, and (c) leveraging patch-wise scale normalization for consistent coordinate bounds. In particular, we present a multi-scale patch-based descriptor generation and a hierarchical inlier search across scales to improve robustness in diverse scenes. We also propose a novel generalizability benchmark using 11 datasets that cover various indoor/outdoor scenarios and sensor modalities, demonstrating that BUFFER-X achieves substantial generalization without prior information or manual parameter tuning for the test datasets. Our code is available at https://github.com/MIT-SPARK/BUFFER-X.

Paper Structure

This paper contains 25 sections, 13 equations, 14 figures, 4 tables.

Figures (14)

  • Figure 1: Success rate (unit: %) of zero-shot point cloud registration with state-of-the-art approaches on 11 datasets Geiger13ijrr-KITTIZeng17cvpr-3dmatchJung23ijrr-HeLiPRHuang21cvpr-PREDATORRegistrationYeshwanth23iccv-Scannet++Qingqing22iros-TIERSRamezani20iros-NewerCollegeSun20cvpr-WaymoDatasetTian23iros-KimeraMultiExperimentsPomerleau12ijrr-ETH. Without any prior information or manual parameter tuning for the test datasets, our BUFFER-X shows robust generalization capability across diverse scenes even though the network is only trained on the 3DMatch dataset Zeng17cvpr-3dmatch.
  • Figure 2: (a) Variation in the number of points after voxelization with different voxel sizes $v$ across datasets. Even in indoor scenes, point counts vary significantly depending on the sensor type (i.e.,TIERSQingqing22iros-TIERS vs. 3DMatchZeng17cvpr-3dmatch). Notably, TIERS and KITTIGeiger13ijrr-KITTI, both using omnidirectional LiDARs, yield different point densities due to indoor vs. outdoor environments. (b) Empirical distribution of the datasets’ maximum range.
  • Figure 3: Overview of our BUFFER-X, which mainly consists of three steps. (a) Geometric bootstrapping (\ref{['sec:geometric']}) to determine the appropriate voxel size and radii for the given source $\mathcal{P}$ and target $\mathcal{Q}$ clouds. (b) Multi-scale patch embedder (\ref{['sec:tri-scale-patch']}) to generate patch-wise descriptor $\mathcal{S}_\xi$ for multiple scale $\xi \in \{l, m, g\}$, where $l$, $m$, and $g$ represent local, middle, and global scales, respectively. Specifically, Mini-SpinNet Ao23CVPR-BUFFER outputs cylindrical feature maps $\mathcal{C}_\xi$ and vector feature set $\mathcal{F}_\xi$. (c) Hierarchical inlier search (\ref{['sec:hierarchical']}), which first performs nearest neighbor-based intra-scale matching using $\mathcal{F}^\mathcal{P}_\xi$ and $\mathcal{F}^\mathcal{Q}_\xi$ at each scale, followed by pairwise transformation estimation. Finally, it identifies globally consistent inliers $\mathcal{I}$ across all scales to refine correspondences based on consensus maximization Sun22ral-TriVoCZhang24tpami-AcceleratingGloballyCM.
  • Figure 4: (a) Visual description of local ($r_l$), middle ($r_m$), and global ($r_g$) radii for the same point to illustrate scale differences and (b) normalized patches ranging from $[-1, 1]$. Note that their reference frames follow the eigenvectors obtained from principal component analysis (PCA) Lim21ral-PatchworkAlexiou24jivp-PointPCA. The $z$-axis is assigned to the eigenvector ${\bm v}_3$, which corresponds to the smallest eigenvalue.
  • Figure 5: Relative translation error (RTE) and relative rotation error (RRE) of our approach to state-of-the-art methods, all trained on 3DMatch and tested on KITTI, with oracle tuning and scale alignment, corresponding to those in \ref{['table:success_rates']} under the + + setting. The **** annotations indicate measurements with a $p$-value $< 10^{-4}$ after a paired $t$-test.
  • ...and 9 more figures