Table of Contents
Fetching ...

Equivariant Local Reference Frames for Unsupervised Non-rigid Point Cloud Shape Correspondence

Ling Wang, Runfa Chen, Yikai Wang, Fuchun Sun, Xinzhou Wang, Sun Kai, Guangyuan Fu, Jianwei Zhang, Wenbing Huang

TL;DR

This work tackles unsupervised non-rigid point cloud shape correspondence under SE(3) pose variability by decoupling global pose through pairwise SE(3)-equivariant Local Reference Frames (LRFs). It introduces EquiShape, a Cross-GVP–driven framework that learns pairwise independent SE(3)-equivariant LRFs with global context, and LRF-Refine, an inference-time gradient-based optimization that adapts LRFs to unseen contexts to improve generalization. Training combines cross- and self-construction losses with a mapping objective, producing invariant similarity descriptors that yield robust correspondences across diverse datasets. Empirical results show substantial accuracy gains on SHREC'19, CAPE, and cross-dataset scenarios, and the approach demonstrates strong generalization and efficiency, with code and models to be released.

Abstract

Unsupervised non-rigid point cloud shape correspondence underpins a multitude of 3D vision tasks, yet itself is non-trivial given the exponential complexity stemming from inter-point degree-of-freedom, i.e., pose transformations. Based on the assumption of local rigidity, one solution for reducing complexity is to decompose the overall shape into independent local regions using Local Reference Frames (LRFs) that are invariant to SE(3) transformations. However, the focus solely on local structure neglects global geometric contexts, resulting in less distinctive LRFs that lack crucial semantic information necessary for effective matching. Furthermore, such complexity introduces out-of-distribution geometric contexts during inference, thus complicating generalization. To this end, we introduce 1) EquiShape, a novel structure tailored to learn pair-wise LRFs with global structural cues for both spatial and semantic consistency, and 2) LRF-Refine, an optimization strategy generally applicable to LRF-based methods, aimed at addressing the generalization challenges. Specifically, for EquiShape, we employ cross-talk within separate equivariant graph neural networks (Cross-GVP) to build long-range dependencies to compensate for the lack of semantic information in local structure modeling, deducing pair-wise independent SE(3)-equivariant LRF vectors for each point. For LRF-Refine, the optimization adjusts LRFs within specific contexts and knowledge, enhancing the geometric and semantic generalizability of point features. Our overall framework surpasses the state-of-the-art methods by a large margin on three benchmarks. Code and models will be publicly available.

Equivariant Local Reference Frames for Unsupervised Non-rigid Point Cloud Shape Correspondence

TL;DR

This work tackles unsupervised non-rigid point cloud shape correspondence under SE(3) pose variability by decoupling global pose through pairwise SE(3)-equivariant Local Reference Frames (LRFs). It introduces EquiShape, a Cross-GVP–driven framework that learns pairwise independent SE(3)-equivariant LRFs with global context, and LRF-Refine, an inference-time gradient-based optimization that adapts LRFs to unseen contexts to improve generalization. Training combines cross- and self-construction losses with a mapping objective, producing invariant similarity descriptors that yield robust correspondences across diverse datasets. Empirical results show substantial accuracy gains on SHREC'19, CAPE, and cross-dataset scenarios, and the approach demonstrates strong generalization and efficiency, with code and models to be released.

Abstract

Unsupervised non-rigid point cloud shape correspondence underpins a multitude of 3D vision tasks, yet itself is non-trivial given the exponential complexity stemming from inter-point degree-of-freedom, i.e., pose transformations. Based on the assumption of local rigidity, one solution for reducing complexity is to decompose the overall shape into independent local regions using Local Reference Frames (LRFs) that are invariant to SE(3) transformations. However, the focus solely on local structure neglects global geometric contexts, resulting in less distinctive LRFs that lack crucial semantic information necessary for effective matching. Furthermore, such complexity introduces out-of-distribution geometric contexts during inference, thus complicating generalization. To this end, we introduce 1) EquiShape, a novel structure tailored to learn pair-wise LRFs with global structural cues for both spatial and semantic consistency, and 2) LRF-Refine, an optimization strategy generally applicable to LRF-based methods, aimed at addressing the generalization challenges. Specifically, for EquiShape, we employ cross-talk within separate equivariant graph neural networks (Cross-GVP) to build long-range dependencies to compensate for the lack of semantic information in local structure modeling, deducing pair-wise independent SE(3)-equivariant LRF vectors for each point. For LRF-Refine, the optimization adjusts LRFs within specific contexts and knowledge, enhancing the geometric and semantic generalizability of point features. Our overall framework surpasses the state-of-the-art methods by a large margin on three benchmarks. Code and models will be publicly available.
Paper Structure (35 sections, 5 theorems, 39 equations, 9 figures, 11 tables)

This paper contains 35 sections, 5 theorems, 39 equations, 9 figures, 11 tables.

Key Result

proposition thmcounterproposition

EquiShape satisfy the constraints in eq:SE(3)-invariance.

Figures (9)

  • Figure 1: Illustration of our insight, which includes: 1) The challenge of exponential variations in shape deformation and orientation caused by inter-point pose transformations; 2) EquiShape addresses such challenge by decomposing the overall shape into independent local regions using LRFs invariant to SE(3) transformations and builds long-range dependencies to compensate for the lack of semantic information in local structure modeling; 3) LRF-Refine optimizes LRFs for adaptation to out-of-distribution geometric contexts, which inevitably arise from such challenge during inference.
  • Figure 2: Illustrative flowchart of EquiShape. 3D Geometric Graph is a graph equipped with the attribute $\Vec{{\bm{Z}}}$, signifying 3D geometric vector features (e.g., global positions, relative directions) that are steerable via $\text{SE}(3)$ transformations. The notation ${\bm{h}}$ denotes non-steerable scalar features, e.g., distances, embedding features. $\overset{\rightarrow}{{\bm{u}}}_i, \overset{\rightarrow}{{\bm{v}}}_i$ are the pair-wise independent SE(3)-equivariant LRF vectors output by Cross-GVP. The function $\operatorname{GS}(\cdot,\cdot)$ constructs per-point pairwise LRF ${\bm{O}}_i$ by applying the Gram-Schmidt orthogonalization process to these LRF vectors.
  • Figure 3: Illustrative flowchart of LRF-Refine. Dashed lines represent back-propagating gradients, while locks indicate that parameters are frozen.
  • Figure 4: Correspondence accuracy at various error tolerances.
  • Figure 5: Visualizations of the correspondence results from SHREC'19 (top) and TOSCA (bottom) test set.
  • ...and 4 more figures

Theorems & Definitions (12)

  • definition thmcounterdefinition: $\text{SE}(3)$-equivariance
  • definition thmcounterdefinition: LRF
  • proposition thmcounterproposition
  • proof
  • theorem thmcountertheorem
  • proof
  • theorem thmcountertheorem
  • proof
  • theorem thmcountertheorem
  • proof
  • ...and 2 more