Table of Contents
Fetching ...

Interactive Character Control with Auto-Regressive Motion Diffusion Models

Yi Shi, Jingbo Wang, Xuekun Jiang, Bingkun Lin, Bo Dai, Xue Bin Peng

TL;DR

The paper tackles real-time, controllable motion synthesis for virtual characters by introducing Auto-regressive Motion Diffusion Models (A-MDM), a lightweight autoregressive diffusion framework trained with DDPM that generates frame-by-frame motion conditioned on the previous frame. It introduces a suite of interaction-centric control methods—task-oriented sampling, conditional inpainting, and hierarchical reinforcement learning—to adapt a pre-trained A-MDM to new tasks without retraining, while maintaining high fidelity and diversity. Key contributions include a lightweight MLP-based diffusion model with few denoising steps, a novel scheduled (student-forcing) training strategy to mitigate drift, and three control modalities that enable flexible downstream task execution on AMASS, 100STYLE, and LaFAN1. The results demonstrate competitive motion quality and diversity against state-of-the-art auto-regressive methods, with real-time performance and robust task generalization, suggesting strong practical potential for interactive graphics, games, and VR.

Abstract

Real-time character control is an essential component for interactive experiences, with a broad range of applications, including physics simulations, video games, and virtual reality. The success of diffusion models for image synthesis has led to the use of these models for motion synthesis. However, the majority of these motion diffusion models are primarily designed for offline applications, where space-time models are used to synthesize an entire sequence of frames simultaneously with a pre-specified length. To enable real-time motion synthesis with diffusion model that allows time-varying controls, we propose A-MDM (Auto-regressive Motion Diffusion Model). Our conditional diffusion model takes an initial pose as input, and auto-regressively generates successive motion frames conditioned on the previous frame. Despite its streamlined network architecture, which uses simple MLPs, our framework is capable of generating diverse, long-horizon, and high-fidelity motion sequences. Furthermore, we introduce a suite of techniques for incorporating interactive controls into A-MDM, such as task-oriented sampling, in-painting, and hierarchical reinforcement learning. These techniques enable a pre-trained A-MDM to be efficiently adapted for a variety of new downstream tasks. We conduct a comprehensive suite of experiments to demonstrate the effectiveness of A-MDM, and compare its performance against state-of-the-art auto-regressive methods.

Interactive Character Control with Auto-Regressive Motion Diffusion Models

TL;DR

The paper tackles real-time, controllable motion synthesis for virtual characters by introducing Auto-regressive Motion Diffusion Models (A-MDM), a lightweight autoregressive diffusion framework trained with DDPM that generates frame-by-frame motion conditioned on the previous frame. It introduces a suite of interaction-centric control methods—task-oriented sampling, conditional inpainting, and hierarchical reinforcement learning—to adapt a pre-trained A-MDM to new tasks without retraining, while maintaining high fidelity and diversity. Key contributions include a lightweight MLP-based diffusion model with few denoising steps, a novel scheduled (student-forcing) training strategy to mitigate drift, and three control modalities that enable flexible downstream task execution on AMASS, 100STYLE, and LaFAN1. The results demonstrate competitive motion quality and diversity against state-of-the-art auto-regressive methods, with real-time performance and robust task generalization, suggesting strong practical potential for interactive graphics, games, and VR.

Abstract

Real-time character control is an essential component for interactive experiences, with a broad range of applications, including physics simulations, video games, and virtual reality. The success of diffusion models for image synthesis has led to the use of these models for motion synthesis. However, the majority of these motion diffusion models are primarily designed for offline applications, where space-time models are used to synthesize an entire sequence of frames simultaneously with a pre-specified length. To enable real-time motion synthesis with diffusion model that allows time-varying controls, we propose A-MDM (Auto-regressive Motion Diffusion Model). Our conditional diffusion model takes an initial pose as input, and auto-regressively generates successive motion frames conditioned on the previous frame. Despite its streamlined network architecture, which uses simple MLPs, our framework is capable of generating diverse, long-horizon, and high-fidelity motion sequences. Furthermore, we introduce a suite of techniques for incorporating interactive controls into A-MDM, such as task-oriented sampling, in-painting, and hierarchical reinforcement learning. These techniques enable a pre-trained A-MDM to be efficiently adapted for a variety of new downstream tasks. We conduct a comprehensive suite of experiments to demonstrate the effectiveness of A-MDM, and compare its performance against state-of-the-art auto-regressive methods.
Paper Structure (40 sections, 11 equations, 12 figures, 6 tables)

This paper contains 40 sections, 11 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: Framework of our A-MDM. Our A-MDM is trained following DDPM ho2020denoising. During training, the goal of our A-MDM is to reconstruct the sampled noise vector $\epsilon^t_f$ at each step. After training, our A-MDM is capable of generating long-horizon motion with arbitrary lengths under different controlling strategies in an autoregressive manner.
  • Figure 2: Trajectories for Target Reaching. We show that our framework is capable of generating diverse motion trajectories, with the same initial state and target goals.
  • Figure 3: Inpainting can generate seamless motion transitions between user-specified motions and arbitrary character states. We introduce a series of buffer frames where inpainting stops at an early diffusion step. While playing out the user-imposed target motion, inpainting is done until the last denoising step.
  • Figure 4: Transition In-betweening through inpainting. To generate smoother transitions, we initialize the denoising process using the target frame at different denoising steps. As the frames approach the time of the target frame, the denoising process is initialized at a later and later denoising step with the target frame, which leads the generated frames to more closely conform to the target frame.
  • Figure 5: A-MDM can be used for key-frame in-betweening to generate plausible motions (blue) between user specified key-frames (white).
  • ...and 7 more figures