ShapeMatcher: Self-Supervised Joint Shape Canonicalization, Segmentation, Retrieval and Deformation

Yan Di; Chenyangguang Zhang; Chaowei Wang; Ruida Zhang; Guangyao Zhai; Yanyan Li; Bowen Fu; Xiangyang Ji; Shan Gao

ShapeMatcher: Self-Supervised Joint Shape Canonicalization, Segmentation, Retrieval and Deformation

Yan Di, Chenyangguang Zhang, Chaowei Wang, Ruida Zhang, Guangyao Zhai, Yanyan Li, Bowen Fu, Xiangyang Ji, Shan Gao

TL;DR

ShapeMatcher addresses partial 3D shape reconstruction under arbitrary poses by jointly learning four interrelated tasks: canonicalization, segmentation, retrieval, and deformation. It uses $SE(3)$-invariant and affine-invariant features to disentangle structure from pose/size, performs region-aware retrieval, and deforms a retrieved CAD model via neural cage deformation guided by part centers. The framework employs cross-task consistency losses and a staged, self-supervised training regime, achieving large gains on PartNet, ComplementMe, and real-world Scan2CAD, including zero-shot scenarios. This approach demonstrates strong robustness to occlusion and pose variation with potential impact on robotics and 3D perception applications.

Abstract

In this paper, we present ShapeMatcher, a unified self-supervised learning framework for joint shape canonicalization, segmentation, retrieval and deformation. Given a partially-observed object in an arbitrary pose, we first canonicalize the object by extracting point-wise affine-invariant features, disentangling inherent structure of the object with its pose and size. These learned features are then leveraged to predict semantically consistent part segmentation and corresponding part centers. Next, our lightweight retrieval module aggregates the features within each part as its retrieval token and compare all the tokens with source shapes from a pre-established database to identify the most geometrically similar shape. Finally, we deform the retrieved shape in the deformation module to tightly fit the input object by harnessing part center guided neural cage deformation. The key insight of ShapeMaker is the simultaneous training of the four highly-associated processes: canonicalization, segmentation, retrieval, and deformation, leveraging cross-task consistency losses for mutual supervision. Extensive experiments on synthetic datasets PartNet, ComplementMe, and real-world dataset Scan2CAD demonstrate that ShapeMaker surpasses competitors by a large margin.

ShapeMatcher: Self-Supervised Joint Shape Canonicalization, Segmentation, Retrieval and Deformation

TL;DR

ShapeMatcher addresses partial 3D shape reconstruction under arbitrary poses by jointly learning four interrelated tasks: canonicalization, segmentation, retrieval, and deformation. It uses

-invariant and affine-invariant features to disentangle structure from pose/size, performs region-aware retrieval, and deforms a retrieved CAD model via neural cage deformation guided by part centers. The framework employs cross-task consistency losses and a staged, self-supervised training regime, achieving large gains on PartNet, ComplementMe, and real-world Scan2CAD, including zero-shot scenarios. This approach demonstrates strong robustness to occlusion and pose variation with potential impact on robotics and 3D perception applications.

Abstract

Paper Structure (14 sections, 15 equations, 4 figures, 5 tables)

This paper contains 14 sections, 15 equations, 4 figures, 5 tables.

Introduction
Related Works
Method
Canonicalization
Segmentation
Retrieval
Deformation
ShapeMatcher: Joint Training
Experiments
Experimental Setup
Synthetic Cases
Real-world Cases
Ablation Studies
Conclusion

Figures (4)

Figure 1: Illustration of ShapeMatcher. Objects obtained from real-world scans are typically noisy, partial and exhibit various poses, making it challenging to conduct an effective $R\&D$ process (Red 'X' on the left). To address this issue, we propose ShapeMatcher that first canonicalizes the objects and then segments them into semantic parts, facilitating $\mathbf{R\&D}$ processes (Green '$\checkmark$' on the right).
Figure 2: The pipeline of ShapeMatcher. Given a target point cloud obtained from a single-view scan and a pre-established database (A), ShapeMatcher generates the fine-grained reconstruction result using the joint 4 modules including Canonicalization (B), Segmentation (C), Retrieval (D) and Deformation (E), where the first three contains the partial branch for target processing and the full branch for source processing. Specifically, the target and source inputs are first canonicalized into the same affine-invariant space (B). Then, the semantic-consistent region segmentation is yielded from the affine-invariant features (C). The segmented regions are fed to the region-weight retrieval module (C) and the part center guided neural cage deformation module (E) for occlusion-robust $\mathbf{R\&D}$ process. During training, the partial-full consistency losses (F) are enforced for the two branches.
Figure 3: Qualitative R&D results with full target inputs on PartNet.
Figure 4: Qualitative R&D results with partial target inputs on the occlusion rate of $25\%$ on PartNet.

ShapeMatcher: Self-Supervised Joint Shape Canonicalization, Segmentation, Retrieval and Deformation

TL;DR

Abstract

ShapeMatcher: Self-Supervised Joint Shape Canonicalization, Segmentation, Retrieval and Deformation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)