Table of Contents
Fetching ...

Taming Diffusion Probabilistic Models for Character Control

Rui Chen, Mingyi Shi, Shaoli Huang, Ping Tan, Taku Komura, Xuelin Chen

TL;DR

The paper tackles real-time, high-quality, and diverse character control using diffusion-based motion models. It introduces CAMDM, a transformer-based Conditional Autoregressive Motion Diffusion Model that conditions on past motion and coarse user controls to generate future motions in real time, employing design innovations like Separate Condition Tokenization, CFG-PM, and HFTE, and achieving high performance with only 8 denoising steps. Extensive experiments on the 100STYLE mocap dataset demonstrate superior motion quality, diversity, and seamless style transitions compared to baselines, validated by ablations and a comprehensive set of metrics. The work enables a single, unified model to animate characters in multiple styles in real time, with practical implications for games and interactive media, while outlining avenues for speedups and multimodal control extensions.

Abstract

We present a novel character control framework that effectively utilizes motion diffusion probabilistic models to generate high-quality and diverse character animations, responding in real-time to a variety of dynamic user-supplied control signals. At the heart of our method lies a transformer-based Conditional Autoregressive Motion Diffusion Model (CAMDM), which takes as input the character's historical motion and can generate a range of diverse potential future motions conditioned on high-level, coarse user control. To meet the demands for diversity, controllability, and computational efficiency required by a real-time controller, we incorporate several key algorithmic designs. These include separate condition tokenization, classifier-free guidance on past motion, and heuristic future trajectory extension, all designed to address the challenges associated with taming motion diffusion probabilistic models for character control. As a result, our work represents the first model that enables real-time generation of high-quality, diverse character animations based on user interactive control, supporting animating the character in multiple styles with a single unified model. We evaluate our method on a diverse set of locomotion skills, demonstrating the merits of our method over existing character controllers. Project page and source codes: https://aiganimation.github.io/CAMDM/

Taming Diffusion Probabilistic Models for Character Control

TL;DR

The paper tackles real-time, high-quality, and diverse character control using diffusion-based motion models. It introduces CAMDM, a transformer-based Conditional Autoregressive Motion Diffusion Model that conditions on past motion and coarse user controls to generate future motions in real time, employing design innovations like Separate Condition Tokenization, CFG-PM, and HFTE, and achieving high performance with only 8 denoising steps. Extensive experiments on the 100STYLE mocap dataset demonstrate superior motion quality, diversity, and seamless style transitions compared to baselines, validated by ablations and a comprehensive set of metrics. The work enables a single, unified model to animate characters in multiple styles in real time, with practical implications for games and interactive media, while outlining avenues for speedups and multimodal control extensions.

Abstract

We present a novel character control framework that effectively utilizes motion diffusion probabilistic models to generate high-quality and diverse character animations, responding in real-time to a variety of dynamic user-supplied control signals. At the heart of our method lies a transformer-based Conditional Autoregressive Motion Diffusion Model (CAMDM), which takes as input the character's historical motion and can generate a range of diverse potential future motions conditioned on high-level, coarse user control. To meet the demands for diversity, controllability, and computational efficiency required by a real-time controller, we incorporate several key algorithmic designs. These include separate condition tokenization, classifier-free guidance on past motion, and heuristic future trajectory extension, all designed to address the challenges associated with taming motion diffusion probabilistic models for character control. As a result, our work represents the first model that enables real-time generation of high-quality, diverse character animations based on user interactive control, supporting animating the character in multiple styles with a single unified model. We evaluate our method on a diverse set of locomotion skills, demonstrating the merits of our method over existing character controllers. Project page and source codes: https://aiganimation.github.io/CAMDM/
Paper Structure (27 sections, 5 equations, 4 figures, 5 tables)

This paper contains 27 sections, 5 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Conditional Autoregressive Motion Diffusion Model (CAMDM). At each denoising step, the model takes as input a noisy motion sample $\mathbf{x} _t$, diffusion step $t$, along with various conditions including the past motion $\mathbf{p}$, style label $\mathbf{c}_l$, future root displacements $\mathbf{c}_{rv}$, and orientations $\mathbf{c}_{ro}$ projected onto the ground, and learns to predict the original clean $\hat{\mathbf{x}} _0$.
  • Figure 2: Illustration of heuristic future trajectory extension. Top: At $t_{cur}$, when the user input changes and autoregression is triggered, the model-predicted future trajectory is shorter than the user-supplied synthetic future trajectory as multiple postures in the generation have been applied to the character. This could result in a blending of two trajectories of different lengths, consequently causing sudden jittering in the next generation if not properly dressed. Bottom: Assuming that the model generates $F = 10$ frames, we simply reuse the last $K = 4$ of the predicted future trajectory multiple times, if necessary, until the length matches that of the user-supplied synthetic future trajectory. For each recycle, we establish a local frame at the last trajectory point based on its position and orientation, then flip the position and copy the orientation of the last $K$ points to extend the trajectory.
  • Figure 3: Visual comparisons of single-style control. From top to bottom: Ours, LMP, MANN-DP, MM-DP and MoGlow, all using the same control inputs. The trajectory of the left hand, right hand, and root are colored in red, green, and black, respectively. Observe the high diversity and better adherence to the control trajectory of our method compared to the other baselines.
  • Figure 4: Visual comparisons of multi-style control. From top to bottom: Ours, LMP, MANN-DP, MM-DP and MoGlow. Our method can transition naturally between distinct styles ("LeftHop" and "RightHop" in this case), whereas baselines fail.