DAE-Net: Deforming Auto-Encoder for fine-grained shape co-segmentation

Zhiqin Chen; Qimin Chen; Hang Zhou; Hao Zhang

DAE-Net: Deforming Auto-Encoder for fine-grained shape co-segmentation

Zhiqin Chen, Qimin Chen, Hang Zhou, Hao Zhang

TL;DR

DAE-Net tackles unsupervised 3D shape co-segmentation by learning deformable part templates shared across a shape collection. It uses an $N$-branch autoencoder where the encoder predicts per-part affine transforms $A_i$, latent codes $Z_i$, and existence scores $P_i$, and the decoder outputs occupancies through per-part templates $G_i$ and deformers $D_i$, enabling fine-grained, cross-shape part correspondences. A novel training scheme with deformation constraints and a revival-based strategy helps escape local minima, while losses on reconstruction, deformation, and sparsity balance fidelity and part granularity. Extensive experiments on ShapeNet Part, DFAUST, and Objaverse demonstrate superior unsupervised co-segmentation performance, meaningful shape clustering, and a practical part-level detailization capability when integrated with DECOR-GAN. The approach advances open-set, template-based 3D segmentation and offers a flexible foundation for downstream tasks like skeletonization and part-aware editing.

Abstract

We present an unsupervised 3D shape co-segmentation method which learns a set of deformable part templates from a shape collection. To accommodate structural variations in the collection, our network composes each shape by a selected subset of template parts which are affine-transformed. To maximize the expressive power of the part templates, we introduce a per-part deformation network to enable the modeling of diverse parts with substantial geometry variations, while imposing constraints on the deformation capacity to ensure fidelity to the originally represented parts. We also propose a training scheme to effectively overcome local minima. Architecturally, our network is a branched autoencoder, with a CNN encoder taking a voxel shape as input and producing per-part transformation matrices, latent codes, and part existence scores, and the decoder outputting point occupancies to define the reconstruction loss. Our network, coined DAE-Net for Deforming Auto-Encoder, can achieve unsupervised 3D shape co-segmentation that yields fine-grained, compact, and meaningful parts that are consistent across diverse shapes. We conduct extensive experiments on the ShapeNet Part dataset, DFAUST, and an animal subset of Objaverse to show superior performance over prior methods. Code and data are available at https://github.com/czq142857/DAE-Net.

DAE-Net: Deforming Auto-Encoder for fine-grained shape co-segmentation

TL;DR

DAE-Net tackles unsupervised 3D shape co-segmentation by learning deformable part templates shared across a shape collection. It uses an

-branch autoencoder where the encoder predicts per-part affine transforms

, latent codes

, and existence scores

, and the decoder outputs occupancies through per-part templates

and deformers

, enabling fine-grained, cross-shape part correspondences. A novel training scheme with deformation constraints and a revival-based strategy helps escape local minima, while losses on reconstruction, deformation, and sparsity balance fidelity and part granularity. Extensive experiments on ShapeNet Part, DFAUST, and Objaverse demonstrate superior unsupervised co-segmentation performance, meaningful shape clustering, and a practical part-level detailization capability when integrated with DECOR-GAN. The approach advances open-set, template-based 3D segmentation and offers a flexible foundation for downstream tasks like skeletonization and part-aware editing.

Abstract

Paper Structure (19 sections, 6 equations, 17 figures, 3 tables)

This paper contains 19 sections, 6 equations, 17 figures, 3 tables.

Introduction
Related work
3D co-segmentation with handcrafted priors.
Co-segmentation via 3D shape reconstruction.
Zero-shot 3D segmentation using pretrained models.
Transforming Auto-encoders.
Method
Network architecture
Loss functions
Overcoming local minima
Training details
Experiments
Unsupervised shape co-segmentation
Ablation studies
Shape clustering
...and 4 more sections

Figures (17)

Figure 1: Network architecture of DAE-Net. Our network consists of $N$ branches representing $N$ parts of a 3D shape. To reconstruct part $i$, the query point coordinates in world frame are first transformed into the local frame of the part using an affine matrix that is predicted by a shape encoder network, a CNN. The transformed local coordinates are further deformed by a deformation MLP $\mathcal{D}_i$ conditioned on a latent code, that is both shape- and part-specific and also produced by the CNN, to refine the part details. Finally, the deformed local coordinates are fed into a part template MLP $\mathcal{G}_i$ to produce the occupancy of the query point. The occupancy is multiplied by the predicted part existence score from the CNN, so that the occupancy is set to zero if the part does not exist in the shape. We sum the occupancies from all $N$ parts to obtain the occupancy of the query point on the entire shape, which is used to compute the reconstruction loss.
Figure 2: Qualitative results on shape segmentation compared to BAE-Net chen2019bae_net and RIM-Net niu2022rim on ShapeNet Part dataset chang2015shapenetyi2016scalable. Within the same category, same color indicates the parts are from the same branch of the network, thus are considered to be corresponded. Since the ground truth segmentation in ShapeNet Part dataset is on point clouds, we color the voxels in (d) using nearest neighbor.
Figure 3: Qualitative results on shape segmentation on an animal subset of Objaverse deitke2023objaverse, and DFAUST dfaust. We also show the skeletons built upon our segmentation. More results can be found in Figure \ref{['fig:supp_animal_2x']} and the Supplementary.
Figure 4: Qualitative results of Ablation study on airplane, chair, and guitar. See Section \ref{['subsec:ablation']} for the meaning of the abbreviations.
Figure 5: Ablation study on the weight $\gamma$ of the sparsity loss $\mathcal{L}_{sparse}$ on mug, pistol, and table. The number under each shape shows the IOU of its category when trained with a specific $\gamma$ value.
...and 12 more figures

DAE-Net: Deforming Auto-Encoder for fine-grained shape co-segmentation

TL;DR

Abstract

DAE-Net: Deforming Auto-Encoder for fine-grained shape co-segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (17)