DAE-Net: Deforming Auto-Encoder for fine-grained shape co-segmentation
Zhiqin Chen, Qimin Chen, Hang Zhou, Hao Zhang
TL;DR
DAE-Net tackles unsupervised 3D shape co-segmentation by learning deformable part templates shared across a shape collection. It uses an $N$-branch autoencoder where the encoder predicts per-part affine transforms $A_i$, latent codes $Z_i$, and existence scores $P_i$, and the decoder outputs occupancies through per-part templates $G_i$ and deformers $D_i$, enabling fine-grained, cross-shape part correspondences. A novel training scheme with deformation constraints and a revival-based strategy helps escape local minima, while losses on reconstruction, deformation, and sparsity balance fidelity and part granularity. Extensive experiments on ShapeNet Part, DFAUST, and Objaverse demonstrate superior unsupervised co-segmentation performance, meaningful shape clustering, and a practical part-level detailization capability when integrated with DECOR-GAN. The approach advances open-set, template-based 3D segmentation and offers a flexible foundation for downstream tasks like skeletonization and part-aware editing.
Abstract
We present an unsupervised 3D shape co-segmentation method which learns a set of deformable part templates from a shape collection. To accommodate structural variations in the collection, our network composes each shape by a selected subset of template parts which are affine-transformed. To maximize the expressive power of the part templates, we introduce a per-part deformation network to enable the modeling of diverse parts with substantial geometry variations, while imposing constraints on the deformation capacity to ensure fidelity to the originally represented parts. We also propose a training scheme to effectively overcome local minima. Architecturally, our network is a branched autoencoder, with a CNN encoder taking a voxel shape as input and producing per-part transformation matrices, latent codes, and part existence scores, and the decoder outputting point occupancies to define the reconstruction loss. Our network, coined DAE-Net for Deforming Auto-Encoder, can achieve unsupervised 3D shape co-segmentation that yields fine-grained, compact, and meaningful parts that are consistent across diverse shapes. We conduct extensive experiments on the ShapeNet Part dataset, DFAUST, and an animal subset of Objaverse to show superior performance over prior methods. Code and data are available at https://github.com/czq142857/DAE-Net.
