Table of Contents
Fetching ...

HUMOS: Human Motion Model Conditioned on Body Shape

Shashank Tripathi, Omid Taheri, Christoph Lassner, Michael J. Black, Daniel Holden, Carsten Stoll

TL;DR

HUMOS tackles the problem that body shape strongly influences how people move by introducing a shape-conditioned, identity-aware motion generator. It combines a Transformer-based conditional VAE with unpaired data training, leveraging cycle-consistency, differentiable intuitive-physics terms, and a dynamic stability loss based on Zero Moment Point to produce diverse, physically plausible, and dynamically stable motions across identities. Key contributions include non-autoregressive, one-shot generation conditioned on identity $(eta, ext{G})$, a self-supervised training regime, differentiable IP terms addressing ground contact, and a ZMP-driven stability objective with latent-embedding losses that regularize the latent space. The method enables high-quality shape-conditioned retargeting and improves biomechanical realism, with strong quantitative gains on AMASS-derived benchmarks and supportive perceptual studies, making it applicable to animation, AR/VR, and robotics.

Abstract

Generating realistic human motion is essential for many computer vision and graphics applications. The wide variety of human body shapes and sizes greatly impacts how people move. However, most existing motion models ignore these differences, relying on a standardized, average body. This leads to uniform motion across different body types, where movements don't match their physical characteristics, limiting diversity. To solve this, we introduce a new approach to develop a generative motion model based on body shape. We show that it's possible to train this model using unpaired data by applying cycle consistency, intuitive physics, and stability constraints, which capture the relationship between identity and movement. The resulting model generates diverse, physically plausible, and dynamically stable human motions that are both quantitatively and qualitatively more realistic than current state-of-the-art methods. More details are available on our project page https://CarstenEpic.github.io/humos/.

HUMOS: Human Motion Model Conditioned on Body Shape

TL;DR

HUMOS tackles the problem that body shape strongly influences how people move by introducing a shape-conditioned, identity-aware motion generator. It combines a Transformer-based conditional VAE with unpaired data training, leveraging cycle-consistency, differentiable intuitive-physics terms, and a dynamic stability loss based on Zero Moment Point to produce diverse, physically plausible, and dynamically stable motions across identities. Key contributions include non-autoregressive, one-shot generation conditioned on identity , a self-supervised training regime, differentiable IP terms addressing ground contact, and a ZMP-driven stability objective with latent-embedding losses that regularize the latent space. The method enables high-quality shape-conditioned retargeting and improves biomechanical realism, with strong quantitative gains on AMASS-derived benchmarks and supportive perceptual studies, making it applicable to animation, AR/VR, and robotics.

Abstract

Generating realistic human motion is essential for many computer vision and graphics applications. The wide variety of human body shapes and sizes greatly impacts how people move. However, most existing motion models ignore these differences, relying on a standardized, average body. This leads to uniform motion across different body types, where movements don't match their physical characteristics, limiting diversity. To solve this, we introduce a new approach to develop a generative motion model based on body shape. We show that it's possible to train this model using unpaired data by applying cycle consistency, intuitive physics, and stability constraints, which capture the relationship between identity and movement. The resulting model generates diverse, physically plausible, and dynamically stable human motions that are both quantitatively and qualitatively more realistic than current state-of-the-art methods. More details are available on our project page https://CarstenEpic.github.io/humos/.
Paper Structure (21 sections, 19 equations, 6 figures, 4 tables)

This paper contains 21 sections, 19 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Visual representation of our architecture. The Encoder takes as input a motion $M_\mathcal{A}$ and its associated identity $\mathcal{I}_\mathcal{A}$, and outputs a latent (identity invariant) encoding of the motion $z_{M_\mathcal{A}}$. The Decoder takes as input the latent encoding of the motion $z_{M_\mathcal{A}}$, along with a different identity $\mathcal{I}_\mathcal{B}$, and produces a retargeted motion appropriate for the given identity $\hat{M}_{\mathcal{A}\to\mathcal{B}}$. The same Encoder and Decoder are used with the original identity to produce a cycle loss $\mathcal{L}_{\text{cycle}}$, while a physics loss $\mathcal{L}_{\text{physics}}$ ensures the retargeted motion $\hat{M}_{\mathcal{A}\to\mathcal{B}}$ is realistic with respect to the given identity $\mathcal{I}_\mathcal{B}$ and prevents the cycle consistency loss from collapsing to a trivial solution.
  • Figure 2: Qualitative comparison of shape-conditioned motion generation. Each row represents generations across different methods for a unique body shape and gender. HUMOS generated motions are more realistic, physically plausible and dynamically stable compared to baselines. The red circles on the baseline methods highlight issues such as floating, penetrations, and foot skating, compared to more realistic results on highlighted in green with HUMOS. Zoom in.
  • Figure S.1: Effect of body shape across (left) interpolated $\beta$ parameters, (center) 150 frames for 6 different identities and (right) different identities for the same jumping jack frame Zoom in.
  • Figure S.2: Additional qualitative comparison of shape-conditioned motion generation. Each row represents generations across different methods for a unique body shape and gender. The difference in quality between methods is particularly evident in their interaction with the ground. Zoom in.
  • Figure S.3: Mean and standard deviation of the first 10 betas parameters in AMASS. This represents the diversity in body shapes.
  • ...and 1 more figures