Guided Motion Diffusion for Controllable Human Motion Synthesis
Korrawe Karunratanakul, Konpat Preechakul, Supasorn Suwajanakorn, Siyu Tang
TL;DR
This work tackles the challenge of integrating spatial constraints into diffusion-based human motion synthesis by introducing Guided Motion Diffusion (GMD), which couples a scalar goal function $G_x(\cdot)$ with two innovations: Emphasis projection to boost trajectory–pose coherence and Dense signal propagation to transform sparse spatial cues into dense conditioning. GMD enables three spatially guided tasks—trajectory-conditioned generation, keyframe-conditioned generation, and obstacle avoidance—while maintaining strong text-to-motion performance that surpasses state-of-the-art baselines. The approach leverages a two-stage pipeline and a learned denoiser to manage guidance biases and propagate constraints through diffusion steps, achieving coherent, controllable motions without retraining for each spatial objective. These capabilities hold practical impact for animation, gaming, and VR, where real-time or near-real-time controlled motion synthesis in 3D environments is increasingly essential.
Abstract
Denoising diffusion models have shown great promise in human motion synthesis conditioned on natural language descriptions. However, integrating spatial constraints, such as pre-defined motion trajectories and obstacles, remains a challenge despite being essential for bridging the gap between isolated human motion and its surrounding environment. To address this issue, we propose Guided Motion Diffusion (GMD), a method that incorporates spatial constraints into the motion generation process. Specifically, we propose an effective feature projection scheme that manipulates motion representation to enhance the coherency between spatial information and local poses. Together with a new imputation formulation, the generated motion can reliably conform to spatial constraints such as global motion trajectories. Furthermore, given sparse spatial constraints (e.g. sparse keyframes), we introduce a new dense guidance approach to turn a sparse signal, which is susceptible to being ignored during the reverse steps, into denser signals to guide the generated motion to the given constraints. Our extensive experiments justify the development of GMD, which achieves a significant improvement over state-of-the-art methods in text-based motion generation while allowing control of the synthesized motions with spatial constraints.
