Table of Contents
Fetching ...

Perpetual Motion: Generating Unbounded Human Motion

Yan Zhang, Michael J. Black, Siyu Tang

TL;DR

This work tackles the challenge of generating perpetual, non-deterministic human motion over long horizons from minimal conditioning. It introduces a two-stream cross-conditional variational RNN that jointly models global translation and body pose, and crucially employs a novel KL-divergence term via a Charbonnier penalty to induce temporal dependencies in the latent sequence without an explicit prior. Trained on the AMASS dataset and evaluated against strong baselines like QuaterNet and STCN, the approach demonstrates improved representation power, motion-frequency diversity, and naturalness, producing plausible motion for at least 10 minutes of generated sequence. The method advances long-horizon motion synthesis for graphics and vision applications and provides a systematic evaluation pipeline for representation, frequency, diversity, and perceptual naturalness.

Abstract

The modeling of human motion using machine learning methods has been widely studied. In essence it is a time-series modeling problem involving predicting how a person will move in the future given how they moved in the past. Existing methods, however, typically have a short time horizon, predicting a only few frames to a few seconds of human motion. Here we focus on long-term prediction; that is, generating long sequences (potentially infinite) of human motion that is plausible. Furthermore, we do not rely on a long sequence of input motion for conditioning, but rather, can predict how someone will move from as little as a single pose. Such a model has many uses in graphics (video games and crowd animation) and vision (as a prior for human motion estimation or for dataset creation). To address this problem, we propose a model to generate non-deterministic, \textit{ever-changing}, perpetual human motion, in which the global trajectory and the body pose are cross-conditioned. We introduce a novel KL-divergence term with an implicit, unknown, prior. We train this using a heavy-tailed function of the KL divergence of a white-noise Gaussian process, allowing latent sequence temporal dependency. We perform systematic experiments to verify its effectiveness and find that it is superior to baseline methods.

Perpetual Motion: Generating Unbounded Human Motion

TL;DR

This work tackles the challenge of generating perpetual, non-deterministic human motion over long horizons from minimal conditioning. It introduces a two-stream cross-conditional variational RNN that jointly models global translation and body pose, and crucially employs a novel KL-divergence term via a Charbonnier penalty to induce temporal dependencies in the latent sequence without an explicit prior. Trained on the AMASS dataset and evaluated against strong baselines like QuaterNet and STCN, the approach demonstrates improved representation power, motion-frequency diversity, and naturalness, producing plausible motion for at least 10 minutes of generated sequence. The method advances long-horizon motion synthesis for graphics and vision applications and provides a systematic evaluation pipeline for representation, frequency, diversity, and perceptual naturalness.

Abstract

The modeling of human motion using machine learning methods has been widely studied. In essence it is a time-series modeling problem involving predicting how a person will move in the future given how they moved in the past. Existing methods, however, typically have a short time horizon, predicting a only few frames to a few seconds of human motion. Here we focus on long-term prediction; that is, generating long sequences (potentially infinite) of human motion that is plausible. Furthermore, we do not rely on a long sequence of input motion for conditioning, but rather, can predict how someone will move from as little as a single pose. Such a model has many uses in graphics (video games and crowd animation) and vision (as a prior for human motion estimation or for dataset creation). To address this problem, we propose a model to generate non-deterministic, \textit{ever-changing}, perpetual human motion, in which the global trajectory and the body pose are cross-conditioned. We introduce a novel KL-divergence term with an implicit, unknown, prior. We train this using a heavy-tailed function of the KL divergence of a white-noise Gaussian process, allowing latent sequence temporal dependency. We perform systematic experiments to verify its effectiveness and find that it is superior to baseline methods.

Paper Structure

This paper contains 39 sections, 1 theorem, 9 equations, 2 figures, 5 tables.

Key Result

Proposition 1

The new KL-divergence in Eq. eq:new_kl can: (1) lead to a higher ELBO than its counterpart with a standard normal distribution prior, (2) introduce temporal dependencies in the latent space, (3) avoid posterior collapse numerically, and (4) retain a low computational cost.

Figures (2)

  • Figure 1: Given an initial body configuration, our method generates "perpetual" human motion with ever-changing limb poses. After 10 minutes, the body pose still varies, and the motion remains realistic.
  • Figure 2: Illustration of our cross-conditional two-stream variational RNN architecture. The blue and orange color denote the body translation stream and the body pose stream, respectively. The circles denote feature concatenation. The dash arrows denote random sampling.

Theorems & Definitions (2)

  • Proposition 1
  • proof