Interactive Character Control with Auto-Regressive Motion Diffusion Models
Yi Shi, Jingbo Wang, Xuekun Jiang, Bingkun Lin, Bo Dai, Xue Bin Peng
TL;DR
The paper tackles real-time, controllable motion synthesis for virtual characters by introducing Auto-regressive Motion Diffusion Models (A-MDM), a lightweight autoregressive diffusion framework trained with DDPM that generates frame-by-frame motion conditioned on the previous frame. It introduces a suite of interaction-centric control methods—task-oriented sampling, conditional inpainting, and hierarchical reinforcement learning—to adapt a pre-trained A-MDM to new tasks without retraining, while maintaining high fidelity and diversity. Key contributions include a lightweight MLP-based diffusion model with few denoising steps, a novel scheduled (student-forcing) training strategy to mitigate drift, and three control modalities that enable flexible downstream task execution on AMASS, 100STYLE, and LaFAN1. The results demonstrate competitive motion quality and diversity against state-of-the-art auto-regressive methods, with real-time performance and robust task generalization, suggesting strong practical potential for interactive graphics, games, and VR.
Abstract
Real-time character control is an essential component for interactive experiences, with a broad range of applications, including physics simulations, video games, and virtual reality. The success of diffusion models for image synthesis has led to the use of these models for motion synthesis. However, the majority of these motion diffusion models are primarily designed for offline applications, where space-time models are used to synthesize an entire sequence of frames simultaneously with a pre-specified length. To enable real-time motion synthesis with diffusion model that allows time-varying controls, we propose A-MDM (Auto-regressive Motion Diffusion Model). Our conditional diffusion model takes an initial pose as input, and auto-regressively generates successive motion frames conditioned on the previous frame. Despite its streamlined network architecture, which uses simple MLPs, our framework is capable of generating diverse, long-horizon, and high-fidelity motion sequences. Furthermore, we introduce a suite of techniques for incorporating interactive controls into A-MDM, such as task-oriented sampling, in-painting, and hierarchical reinforcement learning. These techniques enable a pre-trained A-MDM to be efficiently adapted for a variety of new downstream tasks. We conduct a comprehensive suite of experiments to demonstrate the effectiveness of A-MDM, and compare its performance against state-of-the-art auto-regressive methods.
