AAMDM: Accelerated Auto-regressive Motion Diffusion Model
Tianyu Li, Calvin Qiao, Guanqiao Ren, KangKang Yin, Sehoon Ha
TL;DR
This work introduces AAMDM, a framework for accelerated, high-quality, and diverse motion synthesis at interactive rates by modeling transitions in a compact embedded space $\mathbf{xz} \in \mathbb{R}^{64}$. It combines a fast Generation module based on Denoising Diffusion GANs with an Auto-regressive Diffusion Model polishing stage, enabling long-horizon, multi-modal motion while maintaining efficiency. The method is validated on LaFAN1 and artificial datasets, showing motion quality comparable to heavy baselines like AMDM200 but with orders-of-magnitude speedups, and supported by ablations that justify each component (embedded space, DD-GANs, ADM, and polishing). The approach promises practical impact for real-time gaming and VR by delivering diverse, controllable animations at interactive frame rates.
Abstract
Interactive motion synthesis is essential in creating immersive experiences in entertainment applications, such as video games and virtual reality. However, generating animations that are both high-quality and contextually responsive remains a challenge. Traditional techniques in the game industry can produce high-fidelity animations but suffer from high computational costs and poor scalability. Trained neural network models alleviate the memory and speed issues, yet fall short on generating diverse motions. Diffusion models offer diverse motion synthesis with low memory usage, but require expensive reverse diffusion processes. This paper introduces the Accelerated Auto-regressive Motion Diffusion Model (AAMDM), a novel motion synthesis framework designed to achieve quality, diversity, and efficiency all together. AAMDM integrates Denoising Diffusion GANs as a fast Generation Module, and an Auto-regressive Diffusion Model as a Polishing Module. Furthermore, AAMDM operates in a lower-dimensional embedded space rather than the full-dimensional pose space, which reduces the training complexity as well as further improves the performance. We show that AAMDM outperforms existing methods in motion quality, diversity, and runtime efficiency, through comprehensive quantitative analyses and visual comparisons. We also demonstrate the effectiveness of each algorithmic component through ablation studies.
