Table of Contents
Fetching ...

Human Geometry Distribution for 3D Animation Generation

Xiangjun Tang, Biao Zhang, Peter Wonka

TL;DR

This work tackles the challenge of generating realistic 3D human geometry animations under limited data by introducing a two-stage framework that (1) learns a compact, uniform latent space for human geometry via HuGeoDis with improved SMPL-to-avatar mappings, and (2) trains a conditional diffusion-based animation model to synthesize diverse, temporally coherent clothing dynamics using short-term transitions and identity conditioning. The latent space delivers high-fidelity geometry with substantially reduced Chamfer error and better reconstruction efficiency, while the animation model achieves rich garment dynamics and superior user-study scores, outperforming existing avatar-generation baselines and ablations. Together, these components enable generative, pose-responsive 3D avatars with fine geometric detail and natural clothing behavior, even when training data are scarce. The approach highlights the value of distribution-based geometry representations and short-term transition modeling for scalable, high-quality 3D avatar animation.

Abstract

Generating realistic human geometry animations remains a challenging task, as it requires modeling natural clothing dynamics with fine-grained geometric details under limited data. To address these challenges, we propose two novel designs. First, we propose a compact distribution-based latent representation that enables efficient and high-quality geometry generation. We improve upon previous work by establishing a more uniform mapping between SMPL and avatar geometries. Second, we introduce a generative animation model that fully exploits the diversity of limited motion data. We focus on short-term transitions while maintaining long-term consistency through an identity-conditioned design. These two designs formulate our method as a two-stage framework: the first stage learns a latent space, while the second learns to generate animations within this latent space. We conducted experiments on both our latent space and animation model. We demonstrate that our latent space produces high-fidelity human geometry surpassing previous methods ($90\%$ lower Chamfer Dist.). The animation model synthesizes diverse animations with detailed and natural dynamics ($2.2 \times$ higher user study score), achieving the best results across all evaluation metrics.

Human Geometry Distribution for 3D Animation Generation

TL;DR

This work tackles the challenge of generating realistic 3D human geometry animations under limited data by introducing a two-stage framework that (1) learns a compact, uniform latent space for human geometry via HuGeoDis with improved SMPL-to-avatar mappings, and (2) trains a conditional diffusion-based animation model to synthesize diverse, temporally coherent clothing dynamics using short-term transitions and identity conditioning. The latent space delivers high-fidelity geometry with substantially reduced Chamfer error and better reconstruction efficiency, while the animation model achieves rich garment dynamics and superior user-study scores, outperforming existing avatar-generation baselines and ablations. Together, these components enable generative, pose-responsive 3D avatars with fine geometric detail and natural clothing behavior, even when training data are scarce. The approach highlights the value of distribution-based geometry representations and short-term transition modeling for scalable, high-quality 3D avatar animation.

Abstract

Generating realistic human geometry animations remains a challenging task, as it requires modeling natural clothing dynamics with fine-grained geometric details under limited data. To address these challenges, we propose two novel designs. First, we propose a compact distribution-based latent representation that enables efficient and high-quality geometry generation. We improve upon previous work by establishing a more uniform mapping between SMPL and avatar geometries. Second, we introduce a generative animation model that fully exploits the diversity of limited motion data. We focus on short-term transitions while maintaining long-term consistency through an identity-conditioned design. These two designs formulate our method as a two-stage framework: the first stage learns a latent space, while the second learns to generate animations within this latent space. We conducted experiments on both our latent space and animation model. We demonstrate that our latent space produces high-fidelity human geometry surpassing previous methods ( lower Chamfer Dist.). The animation model synthesizes diverse animations with detailed and natural dynamics ( higher user study score), achieving the best results across all evaluation metrics.

Paper Structure

This paper contains 27 sections, 11 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Our generative framework produces diverse avatar geometry sequences from noise, with geometries represented as points (a). For visualization, these points can be rendered via Gaussian splatting (GS), producing depth images (b) and normal images (c). Colors (b) can then be obtained by GS optimization, using a depth-guided video generation model (Wan 2.1), while the normal images (c) effectively highlight fine folds and wrinkles. Our synthesized geometries are of high quality and can be directly converted into meshes (d) via Poisson reconstruction. The highlighted regions demonstrate fine-grained garment dynamics that faithfully follow human motion.
  • Figure 2: (a) The animation model generates latent auto-regressively. (b) The latent flow-matching model samples detailed geometry from the latent space.
  • Figure 3: Visualization of the density distribution of mapped points. The green indicates fewer points, red indicates more.
  • Figure 4: Comparison across different number of sampling. We apply GS-rendered normal maps for superior detail visualization, which may slightly inflate boundaries due to non-zero GS scales; a point cloud rendering is shown (right) for boundary reference.
  • Figure 5: Results for a female (the identity corresponding to the first frame of each sequence) wearing an outer garment while running and swinging her arms. The garment deforms naturally and follows the body movements. Our approach is the only one that simultaneously captures the dynamic behavior of the garment, preserves high-quality geometric details, and consistently maintains the identity. A reconstructed mesh is shown in (f) for boundary reference.
  • ...and 5 more figures