Table of Contents
Fetching ...

Perpetual Humanoid Control for Real-time Simulated Avatars

Zhengyi Luo, Jinkun Cao, Alexander Winkler, Kris Kitani, Weipeng Xu

TL;DR

The paper introduces Perpetual Humanoid Controller (PHC), a physics-based motion imitator capable of driving real-time avatars without resets and resilient to noisy inputs. It advances a Progressive Multiplicative Control Policy (PMCP) that grows network capacity by learning harder motion sequences through progressively trained primitives and a composer that fuses them, enabling scalable imitation of the AMASS dataset and fail-state recovery without catastrophic forgetting. The approach integrates Adversarial Motion Prior to ensure natural, human-like motion and supports input from video-based estimators or language-generated motion, including a keypoint-based variant that reduces reliance on joint rotations. PHC achieves state-of-the-art imitation performance (up to 98.9% success on MoCap data) and demonstrates robust real-time avatar control from video or language prompts, with reliable recovery from falls and detours. The work offers a practical pathway to perpetual, physically grounded avatars for telepresence, gaming, and embodied AI, while outlining directions for tighter pose-estimator integration and terrain-aware interactions.

Abstract

We present a physics-based humanoid controller that achieves high-fidelity motion imitation and fault-tolerant behavior in the presence of noisy input (e.g. pose estimates from video or generated from language) and unexpected falls. Our controller scales up to learning ten thousand motion clips without using any external stabilizing forces and learns to naturally recover from fail-state. Given reference motion, our controller can perpetually control simulated avatars without requiring resets. At its core, we propose the progressive multiplicative control policy (PMCP), which dynamically allocates new network capacity to learn harder and harder motion sequences. PMCP allows efficient scaling for learning from large-scale motion databases and adding new tasks, such as fail-state recovery, without catastrophic forgetting. We demonstrate the effectiveness of our controller by using it to imitate noisy poses from video-based pose estimators and language-based motion generators in a live and real-time multi-person avatar use case.

Perpetual Humanoid Control for Real-time Simulated Avatars

TL;DR

The paper introduces Perpetual Humanoid Controller (PHC), a physics-based motion imitator capable of driving real-time avatars without resets and resilient to noisy inputs. It advances a Progressive Multiplicative Control Policy (PMCP) that grows network capacity by learning harder motion sequences through progressively trained primitives and a composer that fuses them, enabling scalable imitation of the AMASS dataset and fail-state recovery without catastrophic forgetting. The approach integrates Adversarial Motion Prior to ensure natural, human-like motion and supports input from video-based estimators or language-generated motion, including a keypoint-based variant that reduces reliance on joint rotations. PHC achieves state-of-the-art imitation performance (up to 98.9% success on MoCap data) and demonstrates robust real-time avatar control from video or language prompts, with reliable recovery from falls and detours. The work offers a practical pathway to perpetual, physically grounded avatars for telepresence, gaming, and embodied AI, while outlining directions for tighter pose-estimator integration and terrain-aware interactions.

Abstract

We present a physics-based humanoid controller that achieves high-fidelity motion imitation and fault-tolerant behavior in the presence of noisy input (e.g. pose estimates from video or generated from language) and unexpected falls. Our controller scales up to learning ten thousand motion clips without using any external stabilizing forces and learns to naturally recover from fail-state. Given reference motion, our controller can perpetually control simulated avatars without requiring resets. At its core, we propose the progressive multiplicative control policy (PMCP), which dynamically allocates new network capacity to learn harder and harder motion sequences. PMCP allows efficient scaling for learning from large-scale motion databases and adding new tasks, such as fail-state recovery, without catastrophic forgetting. We demonstrate the effectiveness of our controller by using it to imitate noisy poses from video-based pose estimators and language-based motion generators in a live and real-time multi-person avatar use case.
Paper Structure (56 sections, 9 equations, 7 figures, 6 tables, 1 algorithm)

This paper contains 56 sections, 9 equations, 7 figures, 6 tables, 1 algorithm.

Figures (7)

  • Figure 1: We propose a motion imitator that can naturally recover from falls and walk to far-away reference motion, perpetually controlling simulated avatars without requiring reset. Left: real-time avatars from video, where the blue humanoid recovers from a fall. Right: Imitating 3 disjoint clips of motion generated from language, where our controller fills in the blank. The color gradient indicates the passage of time.
  • Figure 2: Our progressive training procedure to train primitives $\boldsymbol{\mathcal{P}}^{(1)}, \boldsymbol{\mathcal{P}}^{(2)}, \cdots, \boldsymbol{\mathcal{P}}^{(K)}$ by gradually learning harder and harder sequences. Fail recovery $\boldsymbol{\mathcal{P}}^{(F)}$ is trained in the end on simple locomotion data; a composer is then trained to combine these frozen primitives.
  • Figure 3: Goal-conditioned RL framework with Adversarial Motion Prior. Each primitive $\boldsymbol{\mathcal{P}}^{(k)}$ and composer $\boldsymbol{\mathcal{C}}$ is trained using the same procedure, and here we visualize the final product ${\pi_{\text{PHC}}}$.
  • Figure 4: (a) Imitating high-quality MoCap -- spin and kick. (b) Recover from fallen state and go back to reference motion (indicated by red dots). (b) Imitating noisy motion estimated from video. (c) Imitating motion generated from language. (d) Using poses estimated from a webcam stream for a real-time simulated avatar.
  • Figure 5: Our framework can support body shape and gender variations. Here we showcase humanoids of different gender and body proportion holding a standing pose. We construct two kinda of humanoids: capsule-based (top) and mesh-based (bottom). Red: female, Blue: male. Color gradient indicates weight.
  • ...and 2 more figures