UniPhys: Unified Planner and Controller with Diffusion for Flexible Physics-Based Character Control
Yan Wu, Korrawe Karunratanakul, Zhengyi Luo, Siyu Tang
TL;DR
UniPhys tackles the challenge of long-horizon, physics-based character control under multi-modal guidance by unifying planning and control into a single diffusion-based behavior model trained with Diffusion Forcing. The model denoises noisy histories to mitigate accumulated errors from the physics simulator, enabling end-to-end generation conditioned on text, goals, and trajectories. Through guided sampling, it generalizes to unseen control signals without task-specific fine-tuning and demonstrates superior naturalness, robustness, and adaptability across text-driven control, velocity control, goal reaching, and obstacle avoidance. The work includes a large paired state-action-text dataset and a detailed exploration of test-time denoising strategies and ablations. This approach bridges diffusion planning with physics-based control, offering a scalable path toward expressive, long-horizon character animation.
Abstract
Generating natural and physically plausible character motion remains challenging, particularly for long-horizon control with diverse guidance signals. While prior work combines high-level diffusion-based motion planners with low-level physics controllers, these systems suffer from domain gaps that degrade motion quality and require task-specific fine-tuning. To tackle this problem, we introduce UniPhys, a diffusion-based behavior cloning framework that unifies motion planning and control into a single model. UniPhys enables flexible, expressive character motion conditioned on multi-modal inputs such as text, trajectories, and goals. To address accumulated prediction errors over long sequences, UniPhys is trained with the Diffusion Forcing paradigm, learning to denoise noisy motion histories and handle discrepancies introduced by the physics simulator. This design allows UniPhys to robustly generate physically plausible, long-horizon motions. Through guided sampling, UniPhys generalizes to a wide range of control signals, including unseen ones, without requiring task-specific fine-tuning. Experiments show that UniPhys outperforms prior methods in motion naturalness, generalization, and robustness across diverse control tasks.
