Table of Contents
Fetching ...

OmniControl: Control Any Joint at Any Time for Human Motion Generation

Yiming Xie, Varun Jampani, Lei Zhong, Deqing Sun, Huaizu Jiang

TL;DR

OmniControl tackles the challenge of integrating flexible, joint-level spatial constraints into text-conditioned human motion generation using diffusion models. It introduces a hybrid guidance framework with spatial guidance that operates in global coordinates to enforce control signals and realism guidance that injects spatial constraints into attention layers, enabling dense, coherent adjustments across the full body. The approach supports controlling any joint at any time with a single model and demonstrates strong improvements over pelvis-focused baselines, as well as promising results for multi-joint constraints on HumanML3D and KIT-ML. This work advances practical, constraint-aware motion generation with broad applications in interactive robotics, animation, and scene/object integration, while noting tradeoffs in inference time and foot-ground plausibility that motivate future speedups and physics-aware refinements.

Abstract

We present a novel approach named OmniControl for incorporating flexible spatial control signals into a text-conditioned human motion generation model based on the diffusion process. Unlike previous methods that can only control the pelvis trajectory, OmniControl can incorporate flexible spatial control signals over different joints at different times with only one model. Specifically, we propose analytic spatial guidance that ensures the generated motion can tightly conform to the input control signals. At the same time, realism guidance is introduced to refine all the joints to generate more coherent motion. Both the spatial and realism guidance are essential and they are highly complementary for balancing control accuracy and motion realism. By combining them, OmniControl generates motions that are realistic, coherent, and consistent with the spatial constraints. Experiments on HumanML3D and KIT-ML datasets show that OmniControl not only achieves significant improvement over state-of-the-art methods on pelvis control but also shows promising results when incorporating the constraints over other joints.

OmniControl: Control Any Joint at Any Time for Human Motion Generation

TL;DR

OmniControl tackles the challenge of integrating flexible, joint-level spatial constraints into text-conditioned human motion generation using diffusion models. It introduces a hybrid guidance framework with spatial guidance that operates in global coordinates to enforce control signals and realism guidance that injects spatial constraints into attention layers, enabling dense, coherent adjustments across the full body. The approach supports controlling any joint at any time with a single model and demonstrates strong improvements over pelvis-focused baselines, as well as promising results for multi-joint constraints on HumanML3D and KIT-ML. This work advances practical, constraint-aware motion generation with broad applications in interactive robotics, animation, and scene/object integration, while noting tradeoffs in inference time and foot-ground plausibility that motivate future speedups and physics-aware refinements.

Abstract

We present a novel approach named OmniControl for incorporating flexible spatial control signals into a text-conditioned human motion generation model based on the diffusion process. Unlike previous methods that can only control the pelvis trajectory, OmniControl can incorporate flexible spatial control signals over different joints at different times with only one model. Specifically, we propose analytic spatial guidance that ensures the generated motion can tightly conform to the input control signals. At the same time, realism guidance is introduced to refine all the joints to generate more coherent motion. Both the spatial and realism guidance are essential and they are highly complementary for balancing control accuracy and motion realism. By combining them, OmniControl generates motions that are realistic, coherent, and consistent with the spatial constraints. Experiments on HumanML3D and KIT-ML datasets show that OmniControl not only achieves significant improvement over state-of-the-art methods on pelvis control but also shows promising results when incorporating the constraints over other joints.
Paper Structure (26 sections, 4 equations, 8 figures, 10 tables, 1 algorithm)

This paper contains 26 sections, 4 equations, 8 figures, 10 tables, 1 algorithm.

Figures (8)

  • Figure 1: OmniControl can generate realistic human motions given a text prompt and flexible spatial control signals.Darker color indicates later frames in the sequence. The green line or points indicate the input control signals. Best viewed in color.
  • Figure 2: Overview of OmniControl. Our model generates human motions from the text prompt and spatial control signal. At the denoising diffusion step, the model takes the text prompt and a noised motion sequence ${\bm{x}}_t$ as input and estimates the clean motion ${\bm{x}}_0$. To incorporate flexible spatial control signals into the generation process, a hybrid guidance, consisting of realism and spatial guidance, is used to encourage motions to conform to the control signals while being realistic.
  • Figure 3: Detailed illustration of our proposed spatial guidance. The spatial guidance can effectively enforce the controlled joints to adhere to the input control signals.
  • Figure 4: Detailed illustration of our proposed realism guidance. The realism guidance outputs the residuals w.r.t. the features in each attention layer of the motion diffusion model. These residuals can directly perturb the whole-body motion densely and implicitly.
  • Figure 5: Visual comparisons of the ablation designs, our full model, and the baseline GMD.
  • ...and 3 more figures