Human Geometry Distribution for 3D Animation Generation
Xiangjun Tang, Biao Zhang, Peter Wonka
TL;DR
This work tackles the challenge of generating realistic 3D human geometry animations under limited data by introducing a two-stage framework that (1) learns a compact, uniform latent space for human geometry via HuGeoDis with improved SMPL-to-avatar mappings, and (2) trains a conditional diffusion-based animation model to synthesize diverse, temporally coherent clothing dynamics using short-term transitions and identity conditioning. The latent space delivers high-fidelity geometry with substantially reduced Chamfer error and better reconstruction efficiency, while the animation model achieves rich garment dynamics and superior user-study scores, outperforming existing avatar-generation baselines and ablations. Together, these components enable generative, pose-responsive 3D avatars with fine geometric detail and natural clothing behavior, even when training data are scarce. The approach highlights the value of distribution-based geometry representations and short-term transition modeling for scalable, high-quality 3D avatar animation.
Abstract
Generating realistic human geometry animations remains a challenging task, as it requires modeling natural clothing dynamics with fine-grained geometric details under limited data. To address these challenges, we propose two novel designs. First, we propose a compact distribution-based latent representation that enables efficient and high-quality geometry generation. We improve upon previous work by establishing a more uniform mapping between SMPL and avatar geometries. Second, we introduce a generative animation model that fully exploits the diversity of limited motion data. We focus on short-term transitions while maintaining long-term consistency through an identity-conditioned design. These two designs formulate our method as a two-stage framework: the first stage learns a latent space, while the second learns to generate animations within this latent space. We conducted experiments on both our latent space and animation model. We demonstrate that our latent space produces high-fidelity human geometry surpassing previous methods ($90\%$ lower Chamfer Dist.). The animation model synthesizes diverse animations with detailed and natural dynamics ($2.2 \times$ higher user study score), achieving the best results across all evaluation metrics.
