Table of Contents
Fetching ...

AAMDM: Accelerated Auto-regressive Motion Diffusion Model

Tianyu Li, Calvin Qiao, Guanqiao Ren, KangKang Yin, Sehoon Ha

TL;DR

This work introduces AAMDM, a framework for accelerated, high-quality, and diverse motion synthesis at interactive rates by modeling transitions in a compact embedded space $\mathbf{xz} \in \mathbb{R}^{64}$. It combines a fast Generation module based on Denoising Diffusion GANs with an Auto-regressive Diffusion Model polishing stage, enabling long-horizon, multi-modal motion while maintaining efficiency. The method is validated on LaFAN1 and artificial datasets, showing motion quality comparable to heavy baselines like AMDM200 but with orders-of-magnitude speedups, and supported by ablations that justify each component (embedded space, DD-GANs, ADM, and polishing). The approach promises practical impact for real-time gaming and VR by delivering diverse, controllable animations at interactive frame rates.

Abstract

Interactive motion synthesis is essential in creating immersive experiences in entertainment applications, such as video games and virtual reality. However, generating animations that are both high-quality and contextually responsive remains a challenge. Traditional techniques in the game industry can produce high-fidelity animations but suffer from high computational costs and poor scalability. Trained neural network models alleviate the memory and speed issues, yet fall short on generating diverse motions. Diffusion models offer diverse motion synthesis with low memory usage, but require expensive reverse diffusion processes. This paper introduces the Accelerated Auto-regressive Motion Diffusion Model (AAMDM), a novel motion synthesis framework designed to achieve quality, diversity, and efficiency all together. AAMDM integrates Denoising Diffusion GANs as a fast Generation Module, and an Auto-regressive Diffusion Model as a Polishing Module. Furthermore, AAMDM operates in a lower-dimensional embedded space rather than the full-dimensional pose space, which reduces the training complexity as well as further improves the performance. We show that AAMDM outperforms existing methods in motion quality, diversity, and runtime efficiency, through comprehensive quantitative analyses and visual comparisons. We also demonstrate the effectiveness of each algorithmic component through ablation studies.

AAMDM: Accelerated Auto-regressive Motion Diffusion Model

TL;DR

This work introduces AAMDM, a framework for accelerated, high-quality, and diverse motion synthesis at interactive rates by modeling transitions in a compact embedded space . It combines a fast Generation module based on Denoising Diffusion GANs with an Auto-regressive Diffusion Model polishing stage, enabling long-horizon, multi-modal motion while maintaining efficiency. The method is validated on LaFAN1 and artificial datasets, showing motion quality comparable to heavy baselines like AMDM200 but with orders-of-magnitude speedups, and supported by ablations that justify each component (embedded space, DD-GANs, ADM, and polishing). The approach promises practical impact for real-time gaming and VR by delivering diverse, controllable animations at interactive frame rates.

Abstract

Interactive motion synthesis is essential in creating immersive experiences in entertainment applications, such as video games and virtual reality. However, generating animations that are both high-quality and contextually responsive remains a challenge. Traditional techniques in the game industry can produce high-fidelity animations but suffer from high computational costs and poor scalability. Trained neural network models alleviate the memory and speed issues, yet fall short on generating diverse motions. Diffusion models offer diverse motion synthesis with low memory usage, but require expensive reverse diffusion processes. This paper introduces the Accelerated Auto-regressive Motion Diffusion Model (AAMDM), a novel motion synthesis framework designed to achieve quality, diversity, and efficiency all together. AAMDM integrates Denoising Diffusion GANs as a fast Generation Module, and an Auto-regressive Diffusion Model as a Polishing Module. Furthermore, AAMDM operates in a lower-dimensional embedded space rather than the full-dimensional pose space, which reduces the training complexity as well as further improves the performance. We show that AAMDM outperforms existing methods in motion quality, diversity, and runtime efficiency, through comprehensive quantitative analyses and visual comparisons. We also demonstrate the effectiveness of each algorithmic component through ablation studies.
Paper Structure (22 sections, 8 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 22 sections, 8 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: We introduce the Accelerated Auto-regressive Motion Diffusion Model (AAMDM), a novel framework designed to synthesize diverse and high-quality character motions at interactive rates.
  • Figure 2: Overview of AAMDM. AAMDM incorporates three pivotal components for better motion quality and faster inference. Firstly, it models transitions within a low-dimensional embedded space $\mathbf{xz}\in\mathbf{XZ}$. Secondly, the framework features a Generation module, which employs Denoising Diffusion GANs. This module is responsible for efficiently generating initial drafts of motion sequences. Lastly, a Polishing module, which utilizes an Auto-regressive Diffusion Model, refines these initial drafts. A full-pose vector $\mathbf{y}_n$ is then reconstructed from the corresponding embedded vector $\mathbf{xz}_{n}$ using the learned decoder $D^{AE}$.
  • Figure 3: Comparison between motions generated by LMM (top) and AAMDM (Bottom). Starting from a similar character pose, LMM is unable to generate diverse motions while AAMDM can reproduce diverse complex motions.
  • Figure 4: Visualization of the learned transition results of an artificial Squ-9-Gaussian experiment in 2D. We show that AAMDM outperforms baseline methods in learning the many-to-many distribution mapping in sequential scenarios.
  • Figure 5: The network structure used in AAMDM. We use Mish as activation function for all networks.