Table of Contents
Fetching ...

Spherical Latent Motion Prior for Physics-Based Simulated Humanoid Control

Jing Tan, Weisheng Xu, Xiangrui Jiang, Jiaxi Zhang, Kun Yang, Kai Wu, Jiaqi Xiong, Shiting Chen, Yangfan Li, Yixiao Feng, Yuetong Fang, Yujia Zou, Yiqun Song, Renjing Xu

TL;DR

The Spherical Latent Motion Prior (SLMP) is presented, a two-stage method for learning motion priors for physics-based humanoid control that preserves fine motion detail without information loss, and random sampling yields semantically valid and stable behaviors.

Abstract

Learning motion priors for physics-based humanoid control is an active research topic. Existing approaches mainly include variational autoencoders (VAE) and adversarial motion priors (AMP). VAE introduces information loss, and random latent sampling may sometimes produce invalid behaviors. AMP suffers from mode collapse and struggles to capture diverse motion skills. We present the Spherical Latent Motion Prior (SLMP), a two-stage method for learning motion priors. In the first stage, we train a high-quality motion tracking controller. In the second stage, we distill the tracking controller into a spherical latent space. A combination of distillation, a discriminator, and a discriminator-guided local semantic consistency constraint shapes a structured latent action space, allowing stable random sampling without information loss. To evaluate SLMP, we collect a two-hour human combat motion capture dataset and show that SLMP preserves fine motion detail without information loss, and random sampling yields semantically valid and stable behaviors. When applied to a two-agent physics-based combat task, SLMP produces human-like and physically plausible combat behaviors only using simple rule-based rewards. Furthermore, SLMP generalizes across different humanoid robot morphologies, demonstrating its transferability beyond a single simulated avatar.

Spherical Latent Motion Prior for Physics-Based Simulated Humanoid Control

TL;DR

The Spherical Latent Motion Prior (SLMP) is presented, a two-stage method for learning motion priors for physics-based humanoid control that preserves fine motion detail without information loss, and random sampling yields semantically valid and stable behaviors.

Abstract

Learning motion priors for physics-based humanoid control is an active research topic. Existing approaches mainly include variational autoencoders (VAE) and adversarial motion priors (AMP). VAE introduces information loss, and random latent sampling may sometimes produce invalid behaviors. AMP suffers from mode collapse and struggles to capture diverse motion skills. We present the Spherical Latent Motion Prior (SLMP), a two-stage method for learning motion priors. In the first stage, we train a high-quality motion tracking controller. In the second stage, we distill the tracking controller into a spherical latent space. A combination of distillation, a discriminator, and a discriminator-guided local semantic consistency constraint shapes a structured latent action space, allowing stable random sampling without information loss. To evaluate SLMP, we collect a two-hour human combat motion capture dataset and show that SLMP preserves fine motion detail without information loss, and random sampling yields semantically valid and stable behaviors. When applied to a two-agent physics-based combat task, SLMP produces human-like and physically plausible combat behaviors only using simple rule-based rewards. Furthermore, SLMP generalizes across different humanoid robot morphologies, demonstrating its transferability beyond a single simulated avatar.
Paper Structure (40 sections, 17 equations, 12 figures, 4 tables, 1 algorithm)

This paper contains 40 sections, 17 equations, 12 figures, 4 tables, 1 algorithm.

Figures (12)

  • Figure 1: Overview of the Spherical Latent Motion Prior (SLMP). We collect a two-hour combat motion capture dataset and convert it into SMPL motion clips for training. We train a goal-conditioned motion tracking controller, then distill it into a unit-sphere latent space using three losses: imitation, discriminator, and our discriminator-guided local semantic consistency loss ($L_{\text{DLSC}}$). SLMP supports meaningful random sampling and drives downstream tasks such as two-agent combat via simple rewards.
  • Figure 2: Qualitative examples of random latent-conditioned rollouts generated by SLMP. Uniformly sampled latent codes produce diverse, stable, and physically plausible full-body motions. Additional rollouts are provided in the supplemental video.
  • Figure 3: Latent-space motion tracking performance. We evaluate information loss by tracking reference clips through the latent space. SLMP achieves higher success and lower MPJPE than PULSE, and approaches the expert controller.
  • Figure 4: Qualitative comparison of random latent-conditioned rollouts. ASE exhibits repetitive low-diversity behaviors, PULSE collapses occasionally, and SLMP generates diverse and stable motions. See supplemental video for full rollouts.
  • Figure 5: Random latent rollout survival curves over 1000 trials in Isaac Gym. SLMP maintains substantially higher survival rates across long horizons, whereas PULSE survival degrades due to semantic sparsity in the tails of its VAE latent space.
  • ...and 7 more figures