Table of Contents
Fetching ...

DiLO: Disentangled Latent Optimization for Learning Shape and Deformation in Grouped Deforming 3D Objects

Mostofa Rafid Uddin, Jana Armouti, Umong Sain, Md Asib Rahman, Xingjian Li, Min Xu

TL;DR

DiLO tackles unsupervised disentanglement of shape and deformation for grouped deforming 3D objects. It introduces a two-stage framework where Stage 1 performs latent optimization with a shared shape code per group and a per-object deformation code, modulated into a generator via AdaIN; Stage 2 trains two PointNet-based encoders for fast amortized inference. The method demonstrates strong performance on unsupervised deformation transfer, deformation classification, and explainability analyses across SMPL, SMAL, and COMA, often outperforming more complex baselines while maintaining efficiency. The approach yields interpretable latent factors, avoids adversarial training, and leverages shape-group information to achieve robust, scalable 3D shape–deformation disentanglement with practical downstream utility.

Abstract

In this work, we propose a disentangled latent optimization-based method for parameterizing grouped deforming 3D objects into shape and deformation factors in an unsupervised manner. Our approach involves the joint optimization of a generator network along with the shape and deformation factors, supported by specific regularization techniques. For efficient amortized inference of disentangled shape and deformation codes, we train two order-invariant PoinNet-based encoder networks in the second stage of our method. We demonstrate several significant downstream applications of our method, including unsupervised deformation transfer, deformation classification, and explainability analysis. Extensive experiments conducted on 3D human, animal, and facial expression datasets demonstrate that our simple approach is highly effective in these downstream tasks, comparable or superior to existing methods with much higher complexity.

DiLO: Disentangled Latent Optimization for Learning Shape and Deformation in Grouped Deforming 3D Objects

TL;DR

DiLO tackles unsupervised disentanglement of shape and deformation for grouped deforming 3D objects. It introduces a two-stage framework where Stage 1 performs latent optimization with a shared shape code per group and a per-object deformation code, modulated into a generator via AdaIN; Stage 2 trains two PointNet-based encoders for fast amortized inference. The method demonstrates strong performance on unsupervised deformation transfer, deformation classification, and explainability analyses across SMPL, SMAL, and COMA, often outperforming more complex baselines while maintaining efficiency. The approach yields interpretable latent factors, avoids adversarial training, and leverages shape-group information to achieve robust, scalable 3D shape–deformation disentanglement with practical downstream utility.

Abstract

In this work, we propose a disentangled latent optimization-based method for parameterizing grouped deforming 3D objects into shape and deformation factors in an unsupervised manner. Our approach involves the joint optimization of a generator network along with the shape and deformation factors, supported by specific regularization techniques. For efficient amortized inference of disentangled shape and deformation codes, we train two order-invariant PoinNet-based encoder networks in the second stage of our method. We demonstrate several significant downstream applications of our method, including unsupervised deformation transfer, deformation classification, and explainability analysis. Extensive experiments conducted on 3D human, animal, and facial expression datasets demonstrate that our simple approach is highly effective in these downstream tasks, comparable or superior to existing methods with much higher complexity.

Paper Structure

This paper contains 28 sections, 10 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: An overview of our proposed unsupervised shape deformation disentanglement method. On the left, we show the conceptualization of shape codes $s^{(i)}$ and deformation codes $z^{(i)}$. On the right, we demonstrate the two learning stages of our method. In stage 1, we optimize $s^{(i)}$ and $z^{(i)}$ together with a generator network. In stage 2, we infer the optimized codes $s^{(i)}$ and $z^{(i)}$ from the input 3D object using two PointNet qi2017pointnet encoders.
  • Figure 2: Unsupervised 3D Deformation Transfer in SMPL (left) and SMAL (right) datasets by our method. Additional visualizations can be found in the supplementary material.
  • Figure 3: Qualitative results of DiLO on COMA. Top row: identity sources (shape codes). Left column: expression sources (content codes). Middle: generated faces combining identity and expression, accurately reflecting both source traits.
  • Figure 4: Results on explainability of DiLO. (a) A sample 3D mesh (b) Vertex importance learned by DiLO content encoder (c) Vertex importance learned by DiLO class encoder. Red represents high importance, blue represents low importance.
  • Figure 5: Unsupervised 3D deformation transfer with DiLO on SMPL-NPT dataset
  • ...and 7 more figures