Table of Contents
Fetching ...

Diffusion Path Alignment for Long-Range Motion Generation and Domain Transitions

Haichao Wang, Alexander Okupnik, Yuxing Han, Gene Wen, Johannes Schneider, Kyriakos Flouris

Abstract

Long-range human movement generation remains a central challenge in computer vision and graphics. Generating coherent transitions across semantically distinct motion domains remains largely unexplored. This capability is particularly important for applications such as dance choreography, where movements must fluidly transition across diverse stylistic and semantic motifs. We propose a simple and effective inference-time optimization framework inspired by diffusion-based stochastic optimal control. Specifically, a control-energy objective that explicitly regularizes the transition trajectories of a pretrained diffusion model. We show that optimizing this objective at inference time yields transitions with fidelity and temporal coherence. This is the first work to provide a general framework for controlled long-range human motion generation with explicit transition modeling.

Diffusion Path Alignment for Long-Range Motion Generation and Domain Transitions

Abstract

Long-range human movement generation remains a central challenge in computer vision and graphics. Generating coherent transitions across semantically distinct motion domains remains largely unexplored. This capability is particularly important for applications such as dance choreography, where movements must fluidly transition across diverse stylistic and semantic motifs. We propose a simple and effective inference-time optimization framework inspired by diffusion-based stochastic optimal control. Specifically, a control-energy objective that explicitly regularizes the transition trajectories of a pretrained diffusion model. We show that optimizing this objective at inference time yields transitions with fidelity and temporal coherence. This is the first work to provide a general framework for controlled long-range human motion generation with explicit transition modeling.

Paper Structure

This paper contains 26 sections, 11 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: We propose Movement Diffusion Path Alignment (M-DPA) for controlled long-range motion generation. Our method optimizes segment-wise mixing coefficients between paired diffusion conditions $\omega$ at inference time, minimizing a control energy objective derived from stochastic optimal control. Hard stitching constraints enforce exact temporal continuity between consecutive motion segments. Example illustrates the transition from general movement $\omega=0$ domain to dance $\omega=1$.
  • Figure 2: Illustration of the guided denoising with additional control energy objective. Black line illustrations trajectory in latent space at $t=T$, red represents unconditional denoising trajectory with $\epsilon_\theta(x_t,t;\varnothing)$ and the blue line with mixed guided denoising $\epsilon_\theta(x_t,t;\omega)$. $E_c$ is the control energy to be minimized for the guidance mechanisms to be close to the unguided trajectory.
  • Figure 3: Subplots show generations of linear (first row), Sine (second row) and M-DPA (last row) generated movement samples. The dash lined bounding box marks the transition phase. While the heuristic methods like linear and sine interpolation introduces additional folding of the charakter, the M-DPA shows more coherent transitions.
  • Figure 4: Subplots show the results of the $\omega$ optimization along different denoising steps for different class transitions. Both inbetween segments $2$ and $3$ show a similar pattern. $\omega$ peaks at denoising step $7$ and then monotonically decreasing, consistently for all the class transitions $0\rightarrow1$, $0\rightarrow5$, $0\rightarrow9$.
  • Figure 5: Evolution of control energy for both the Sine interpolation baseline and M-DPA.