Equivariant Local Reference Frames for Unsupervised Non-rigid Point Cloud Shape Correspondence
Ling Wang, Runfa Chen, Yikai Wang, Fuchun Sun, Xinzhou Wang, Sun Kai, Guangyuan Fu, Jianwei Zhang, Wenbing Huang
TL;DR
This work tackles unsupervised non-rigid point cloud shape correspondence under SE(3) pose variability by decoupling global pose through pairwise SE(3)-equivariant Local Reference Frames (LRFs). It introduces EquiShape, a Cross-GVP–driven framework that learns pairwise independent SE(3)-equivariant LRFs with global context, and LRF-Refine, an inference-time gradient-based optimization that adapts LRFs to unseen contexts to improve generalization. Training combines cross- and self-construction losses with a mapping objective, producing invariant similarity descriptors that yield robust correspondences across diverse datasets. Empirical results show substantial accuracy gains on SHREC'19, CAPE, and cross-dataset scenarios, and the approach demonstrates strong generalization and efficiency, with code and models to be released.
Abstract
Unsupervised non-rigid point cloud shape correspondence underpins a multitude of 3D vision tasks, yet itself is non-trivial given the exponential complexity stemming from inter-point degree-of-freedom, i.e., pose transformations. Based on the assumption of local rigidity, one solution for reducing complexity is to decompose the overall shape into independent local regions using Local Reference Frames (LRFs) that are invariant to SE(3) transformations. However, the focus solely on local structure neglects global geometric contexts, resulting in less distinctive LRFs that lack crucial semantic information necessary for effective matching. Furthermore, such complexity introduces out-of-distribution geometric contexts during inference, thus complicating generalization. To this end, we introduce 1) EquiShape, a novel structure tailored to learn pair-wise LRFs with global structural cues for both spatial and semantic consistency, and 2) LRF-Refine, an optimization strategy generally applicable to LRF-based methods, aimed at addressing the generalization challenges. Specifically, for EquiShape, we employ cross-talk within separate equivariant graph neural networks (Cross-GVP) to build long-range dependencies to compensate for the lack of semantic information in local structure modeling, deducing pair-wise independent SE(3)-equivariant LRF vectors for each point. For LRF-Refine, the optimization adjusts LRFs within specific contexts and knowledge, enhancing the geometric and semantic generalizability of point features. Our overall framework surpasses the state-of-the-art methods by a large margin on three benchmarks. Code and models will be publicly available.
