Table of Contents
Fetching ...

BiEquiFormer: Bi-Equivariant Representations for Global Point Cloud Registration

Stefanos Pertigkiozoglou, Evangelos Chatzipantazis, Kostas Daniilidis

TL;DR

BiEquiformer tackles global PCR under arbitrary input poses by introducing SE(3)$\times$SE(3) bi-equivariance through a detector-free, bi-equivariant transformer that fuses two point clouds via invariant and vector features. The method employs a coarse-to-fine pipeline with bi-equivariant intra- and cross-attention, an OT-based fine matching stage, and optional iterative refinement, achieving competitive canonical performance and superior robustness on the 3DMatch and 3DLoMatch datasets. This work demonstrates that explicit bi-equivariance improves pose-consistent registration at scene scale, with implications for robust SLAM and robotic manipulation, while acknowledging memory overhead and remaining gaps in the canonical setting. Overall, BiEquiformer advances global PCR by embedding symmetry considerations into both feature extraction and matching, enabling reliable registration across diverse initial configurations.

Abstract

The goal of this paper is to address the problem of global point cloud registration (PCR) i.e., finding the optimal alignment between point clouds irrespective of the initial poses of the scans. This problem is notoriously challenging for classical optimization methods due to computational constraints. First, we show that state-of-the-art deep learning methods suffer from huge performance degradation when the point clouds are arbitrarily placed in space. We propose that equivariant deep learning should be utilized for solving this task and we characterize the specific type of bi-equivariance of PCR. Then, we design BiEquiformer a novel and scalable bi-equivariant pipeline i.e. equivariant to the independent transformations of the input point clouds. While a naive approach would process the point clouds independently we design expressive bi-equivariant layers that fuse the information from both point clouds. This allows us to extract high-quality superpoint correspondences and in turn, robust point-cloud registration. Extensive comparisons against state-of-the-art methods show that our method achieves comparable performance in the canonical setting and superior performance in the robust setting in both the 3DMatch and the challenging low-overlap 3DLoMatch dataset.

BiEquiFormer: Bi-Equivariant Representations for Global Point Cloud Registration

TL;DR

BiEquiformer tackles global PCR under arbitrary input poses by introducing SE(3)SE(3) bi-equivariance through a detector-free, bi-equivariant transformer that fuses two point clouds via invariant and vector features. The method employs a coarse-to-fine pipeline with bi-equivariant intra- and cross-attention, an OT-based fine matching stage, and optional iterative refinement, achieving competitive canonical performance and superior robustness on the 3DMatch and 3DLoMatch datasets. This work demonstrates that explicit bi-equivariance improves pose-consistent registration at scene scale, with implications for robust SLAM and robotic manipulation, while acknowledging memory overhead and remaining gaps in the canonical setting. Overall, BiEquiformer advances global PCR by embedding symmetry considerations into both feature extraction and matching, enabling reliable registration across diverse initial configurations.

Abstract

The goal of this paper is to address the problem of global point cloud registration (PCR) i.e., finding the optimal alignment between point clouds irrespective of the initial poses of the scans. This problem is notoriously challenging for classical optimization methods due to computational constraints. First, we show that state-of-the-art deep learning methods suffer from huge performance degradation when the point clouds are arbitrarily placed in space. We propose that equivariant deep learning should be utilized for solving this task and we characterize the specific type of bi-equivariance of PCR. Then, we design BiEquiformer a novel and scalable bi-equivariant pipeline i.e. equivariant to the independent transformations of the input point clouds. While a naive approach would process the point clouds independently we design expressive bi-equivariant layers that fuse the information from both point clouds. This allows us to extract high-quality superpoint correspondences and in turn, robust point-cloud registration. Extensive comparisons against state-of-the-art methods show that our method achieves comparable performance in the canonical setting and superior performance in the robust setting in both the 3DMatch and the challenging low-overlap 3DLoMatch dataset.
Paper Structure (24 sections, 8 theorems, 22 equations, 4 figures, 2 tables)

This paper contains 24 sections, 8 theorems, 22 equations, 4 figures, 2 tables.

Key Result

Proposition 3.1

PCR is output SE(3)-bi-equivariant. i.e. for all $(\mathcal{T}_1,\mathcal{T}_2) \in SE(3) \times SE(3)$: $PCR(\mathcal{T}_1X^r, \mathcal{T}_2Y^s) = \mathcal{T}_1\mathcal{T}_s^r \mathcal{T}_2^{-1}$.

Figures (4)

  • Figure 1: Inlier Ratios (IR) and Registration Metrics (RRE,RTE,RMSE) for two pairs of low-overlap scans that differ only by their relative pose. (Left) Both GeoTransformer (a state-of-the-art method) and BiEquiformer recover the correct registration and high IR. (Right): GeoTransformer fails to find good matches (low IR) in this relative pose and predicts an incorrect registration. In contrast, BiEquiformer is designed to perform consistently irrespective of the initial point cloud poses.
  • Figure 2: BiEquiFormer is an attention-based bi-equivariant pipeline for global PCR. First, equivariant intra-point self-attention and inter-point cross-attention layers update the scalar and vector features on the points. Then a bi-equivariant feature is used to align the input vectors to the same frame before applying equivariant cross-attention. The output invariant coarse features are used to extract a set of candidate coarse matches which are processed by a fine point matching module to extract a candidate transformation. A final estimate is computed using a local-to-global transformation scheme. After the first transformation is estimated (Global Step) we can apply BiEquiFormer iteratively by switching the bi-equivariant frame alignment block with the current rotation estimation (Local Step).
  • Figure 3: Registration Recall and Inlier Ratio for GeoTransformer qin2022geometric, Cofinet yu2021cofinet and Predator Huang_2021_CVPR on different overlap ranges of the total 3DMatch zeng20163dmatch. The green lines (mean original) show the mean per overlap range for the original dataset. The blue lines (mean augmented) show the mean per overlap range of an augmented dataset in which each point cloud has been uniformly roto-translated creating a total of 54 configurations per pair. The red line (robust augmented) shows the mean per overlap range of the minimum across the 54 different configurations. The total mean across all pairs in the dataset for each case is also shown in the plot.
  • Figure 4: Registration results achieved by our method compared to the ground truth alignment.

Theorems & Definitions (16)

  • Proposition 3.1
  • Proposition 3.2
  • Proposition 3.3
  • Proposition 4.1
  • Proposition 4.2
  • Proposition 7.1
  • proof
  • proof : Proof of Proposition \ref{['prop:pcr_bieq']}
  • proof : Proof of proposition \ref{['prop:eq_flip']}
  • proof : proof of Proposition \ref{['prop:eq_ord']}
  • ...and 6 more