Table of Contents
Fetching ...

RecMoDiffuse: Recurrent Flow Diffusion for Human Motion Generation

Mirgahney Mohamed, Harry Jake Cunningham, Marc P. Deisenroth, Lourdes Agapito

TL;DR

RecMoDiffuse addresses the challenge of generating coherent long-horizon human motion by introducing a recurrent diffusion framework that enforces temporal consistency through a recurrent normalizing flow and a diffusion-grid. This approach extends diffusion models to the temporal dimension, allowing inference to proceed autoregressively and enabling staircase sampling to reduce computation. Empirical results on KIT-ML and HumanML3D demonstrate competitive quality with state-of-the-art methods while achieving substantial speedups during inference. The work highlights a practical path to temporally coherent diffusion-based motion generation and lays groundwork for latent-space extensions.

Abstract

Human motion generation has paramount importance in computer animation. It is a challenging generative temporal modelling task due to the vast possibilities of human motion, high human sensitivity to motion coherence and the difficulty of accurately generating fine-grained motions. Recently, diffusion methods have been proposed for human motion generation due to their high sample quality and expressiveness. However, generated sequences still suffer from motion incoherence, and are limited to short duration, and simpler motion and take considerable time during inference. To address these limitations, we propose \textit{RecMoDiffuse: Recurrent Flow Diffusion}, a new recurrent diffusion formulation for temporal modelling. Unlike previous work, which applies diffusion to the whole sequence without any temporal dependency, an approach that inherently makes temporal consistency hard to achieve. Our method explicitly enforces temporal constraints with the means of normalizing flow models in the diffusion process and thereby extends diffusion to the temporal dimension. We demonstrate the effectiveness of RecMoDiffuse in the temporal modelling of human motion. Our experiments show that RecMoDiffuse achieves comparable results with state-of-the-art methods while generating coherent motion sequences and reducing the computational overhead in the inference stage.

RecMoDiffuse: Recurrent Flow Diffusion for Human Motion Generation

TL;DR

RecMoDiffuse addresses the challenge of generating coherent long-horizon human motion by introducing a recurrent diffusion framework that enforces temporal consistency through a recurrent normalizing flow and a diffusion-grid. This approach extends diffusion models to the temporal dimension, allowing inference to proceed autoregressively and enabling staircase sampling to reduce computation. Empirical results on KIT-ML and HumanML3D demonstrate competitive quality with state-of-the-art methods while achieving substantial speedups during inference. The work highlights a practical path to temporally coherent diffusion-based motion generation and lays groundwork for latent-space extensions.

Abstract

Human motion generation has paramount importance in computer animation. It is a challenging generative temporal modelling task due to the vast possibilities of human motion, high human sensitivity to motion coherence and the difficulty of accurately generating fine-grained motions. Recently, diffusion methods have been proposed for human motion generation due to their high sample quality and expressiveness. However, generated sequences still suffer from motion incoherence, and are limited to short duration, and simpler motion and take considerable time during inference. To address these limitations, we propose \textit{RecMoDiffuse: Recurrent Flow Diffusion}, a new recurrent diffusion formulation for temporal modelling. Unlike previous work, which applies diffusion to the whole sequence without any temporal dependency, an approach that inherently makes temporal consistency hard to achieve. Our method explicitly enforces temporal constraints with the means of normalizing flow models in the diffusion process and thereby extends diffusion to the temporal dimension. We demonstrate the effectiveness of RecMoDiffuse in the temporal modelling of human motion. Our experiments show that RecMoDiffuse achieves comparable results with state-of-the-art methods while generating coherent motion sequences and reducing the computational overhead in the inference stage.
Paper Structure (36 sections, 33 equations, 6 figures, 6 tables, 3 algorithms)

This paper contains 36 sections, 33 equations, 6 figures, 6 tables, 3 algorithms.

Figures (6)

  • Figure 1: Our method can produce high-quality and diverse motion generation from a text prompt. The darker colours indicate later in the time pose.
  • Figure 2: Method overview: RecMoDiffuse explicitly enforces temporal constant into the diffusion process, both forward and reverse processes follow temporal constraints. We propose alternate training (left): where we alternate between flow training on clean samples (flow-only) and diffusion with a fixed flow (diffusion-only and diffusion-flow). During inference (right), our design allows us to skip diffusion steps, which reduces computation costs during inference considerably compared to baselines.
  • Figure 3: Qualitative comparison with the state-of-the-art methods on HumanML3D Guo_2022_CVPR. We provide the visualized motion results and real references from three text prompts. Our method generations match the textual descriptions.
  • Figure 4: Qualitative results on the HumanML3D dataset. We compare our method with MotionDiffuse zhang2022motiondiffuse and visualize two examples for each given prompt. Our method achieves comparable results with MotionDiffuse.
  • Figure 5: (A): Shows our LSTM Hochreiter_01book augmented normalizing flow transformation network (recurrent normalizing flow). (B): Shows forward and reverse processes of our diffusion model, both forward and reverse processes depend on the previous temporal and diffused steps.
  • ...and 1 more figures