Table of Contents
Fetching ...

UniPhys: Unified Planner and Controller with Diffusion for Flexible Physics-Based Character Control

Yan Wu, Korrawe Karunratanakul, Zhengyi Luo, Siyu Tang

TL;DR

UniPhys tackles the challenge of long-horizon, physics-based character control under multi-modal guidance by unifying planning and control into a single diffusion-based behavior model trained with Diffusion Forcing. The model denoises noisy histories to mitigate accumulated errors from the physics simulator, enabling end-to-end generation conditioned on text, goals, and trajectories. Through guided sampling, it generalizes to unseen control signals without task-specific fine-tuning and demonstrates superior naturalness, robustness, and adaptability across text-driven control, velocity control, goal reaching, and obstacle avoidance. The work includes a large paired state-action-text dataset and a detailed exploration of test-time denoising strategies and ablations. This approach bridges diffusion planning with physics-based control, offering a scalable path toward expressive, long-horizon character animation.

Abstract

Generating natural and physically plausible character motion remains challenging, particularly for long-horizon control with diverse guidance signals. While prior work combines high-level diffusion-based motion planners with low-level physics controllers, these systems suffer from domain gaps that degrade motion quality and require task-specific fine-tuning. To tackle this problem, we introduce UniPhys, a diffusion-based behavior cloning framework that unifies motion planning and control into a single model. UniPhys enables flexible, expressive character motion conditioned on multi-modal inputs such as text, trajectories, and goals. To address accumulated prediction errors over long sequences, UniPhys is trained with the Diffusion Forcing paradigm, learning to denoise noisy motion histories and handle discrepancies introduced by the physics simulator. This design allows UniPhys to robustly generate physically plausible, long-horizon motions. Through guided sampling, UniPhys generalizes to a wide range of control signals, including unseen ones, without requiring task-specific fine-tuning. Experiments show that UniPhys outperforms prior methods in motion naturalness, generalization, and robustness across diverse control tasks.

UniPhys: Unified Planner and Controller with Diffusion for Flexible Physics-Based Character Control

TL;DR

UniPhys tackles the challenge of long-horizon, physics-based character control under multi-modal guidance by unifying planning and control into a single diffusion-based behavior model trained with Diffusion Forcing. The model denoises noisy histories to mitigate accumulated errors from the physics simulator, enabling end-to-end generation conditioned on text, goals, and trajectories. Through guided sampling, it generalizes to unseen control signals without task-specific fine-tuning and demonstrates superior naturalness, robustness, and adaptability across text-driven control, velocity control, goal reaching, and obstacle avoidance. The work includes a large paired state-action-text dataset and a detailed exploration of test-time denoising strategies and ablations. This approach bridges diffusion planning with physics-based control, offering a scalable path toward expressive, long-horizon character animation.

Abstract

Generating natural and physically plausible character motion remains challenging, particularly for long-horizon control with diverse guidance signals. While prior work combines high-level diffusion-based motion planners with low-level physics controllers, these systems suffer from domain gaps that degrade motion quality and require task-specific fine-tuning. To tackle this problem, we introduce UniPhys, a diffusion-based behavior cloning framework that unifies motion planning and control into a single model. UniPhys enables flexible, expressive character motion conditioned on multi-modal inputs such as text, trajectories, and goals. To address accumulated prediction errors over long sequences, UniPhys is trained with the Diffusion Forcing paradigm, learning to denoise noisy motion histories and handle discrepancies introduced by the physics simulator. This design allows UniPhys to robustly generate physically plausible, long-horizon motions. Through guided sampling, UniPhys generalizes to a wide range of control signals, including unseen ones, without requiring task-specific fine-tuning. Experiments show that UniPhys outperforms prior methods in motion naturalness, generalization, and robustness across diverse control tasks.

Paper Structure

This paper contains 19 sections, 7 equations, 5 figures, 6 tables, 1 algorithm.

Figures (5)

  • Figure 1: UniPhys is a diffusion-based unified planner and controller for physics-based character control, handling diverse tasks with a single model. We showcase its effectiveness in (a) text-driven control with dynamic language instructions, (b) precise velocity control, (c) sparse goal reaching, and (d) adapting to dynamic environments with moving object avoidance.
  • Figure 2: We construct a large-scale paired state-action dataset by tracking MoCap dataset with PULSE tracking policy luo2024universal.
  • Figure 3: Framework overview. (a) The model takes a behavior sequence of length T as input and is conditioned on the clip-based text embedding. At training time, each frame is corrupted with different noise levels, and the model learns to predict the clean behavior sequence. (b) At test time, guided denoising with task-specific guidance enables flexible multi-task control. We highlight the flexibility in different test-time denoising conditions and configurations, and the stabilization trick that promotes stable long-horizon autoregressive control.
  • Figure 4: Expressive text-driven control with smooth transition between skills.
  • Figure C.1: User study interface on the Amazon Mechanical Turk (AMT).