Generative Human Geometry Distribution
Xiangjun Tang, Biao Zhang, Peter Wonka
TL;DR
This work introduces Generative Human Geometry Distribution, a framework that models distributions of human geometries by encoding each geometry as a compact 2D feature map and using the SMPL domain as the learning space. It replaces the prior Gaussian prior with an SMPL-based source distribution and employs a two-stage diffusion-flow pipeline: first compressing geometry distributions into latent maps, then learning a distribution over these maps, both conditioned on SMPL. The method enables pose-conditioned random avatar generation and avatar-consistent novel pose synthesis, delivering substantial gains in geometry quality over state-of-the-art methods while remaining robust to pose variation and conditioning mismatches. This distribution-over-distribution approach reduces memory and computation barriers associated with single-geometry representations and has practical implications for scalable, high-fidelity 3D human synthesis and animation.
Abstract
Realistic human geometry generation is an important yet challenging task, requiring both the preservation of fine clothing details and the accurate modeling of clothing-body interactions. To tackle this challenge, we build upon Geometry distributions, a recently proposed representation that can model a single human geometry with high fidelity using a flow matching model. However, extending a single-geometry distribution to a dataset is non-trivial and inefficient for large-scale learning. To address this, we propose a new geometry distribution model by two key techniques: (1) encoding distributions as 2D feature maps rather than network parameters, and (2) using SMPL models as the domain instead of Gaussian and refining the associated flow velocity field. We then design a generative framework adopting a two staged training paradigm analogous to state-of-the-art image and 3D generative models. In the first stage, we compress geometry distributions into a latent space using a diffusion flow model; the second stage trains another flow model on this latent space. We validate our approach on two key tasks: pose-conditioned random avatar generation and avatar-consistent novel pose synthesis. Experimental results demonstrate that our method outperforms existing state-of-the-art methods, achieving a 57% improvement in geometry quality.
