DiLO: Disentangled Latent Optimization for Learning Shape and Deformation in Grouped Deforming 3D Objects
Mostofa Rafid Uddin, Jana Armouti, Umong Sain, Md Asib Rahman, Xingjian Li, Min Xu
TL;DR
DiLO tackles unsupervised disentanglement of shape and deformation for grouped deforming 3D objects. It introduces a two-stage framework where Stage 1 performs latent optimization with a shared shape code per group and a per-object deformation code, modulated into a generator via AdaIN; Stage 2 trains two PointNet-based encoders for fast amortized inference. The method demonstrates strong performance on unsupervised deformation transfer, deformation classification, and explainability analyses across SMPL, SMAL, and COMA, often outperforming more complex baselines while maintaining efficiency. The approach yields interpretable latent factors, avoids adversarial training, and leverages shape-group information to achieve robust, scalable 3D shape–deformation disentanglement with practical downstream utility.
Abstract
In this work, we propose a disentangled latent optimization-based method for parameterizing grouped deforming 3D objects into shape and deformation factors in an unsupervised manner. Our approach involves the joint optimization of a generator network along with the shape and deformation factors, supported by specific regularization techniques. For efficient amortized inference of disentangled shape and deformation codes, we train two order-invariant PoinNet-based encoder networks in the second stage of our method. We demonstrate several significant downstream applications of our method, including unsupervised deformation transfer, deformation classification, and explainability analysis. Extensive experiments conducted on 3D human, animal, and facial expression datasets demonstrate that our simple approach is highly effective in these downstream tasks, comparable or superior to existing methods with much higher complexity.
