Table of Contents
Fetching ...

PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization

Yangsong Zhang, Anujith Muraleedharan, Rikhat Akizhanov, Abdul Ahad Butt, Gül Varol, Pascal Fua, Fabio Pizzati, Ivan Laptev

Abstract

Recent progress in text-conditioned human motion generation has been largely driven by diffusion models trained on large-scale human motion data. Building on this progress, recent methods attempt to transfer such models for character animation and real robot control by applying a Whole-Body Controller (WBC) that converts diffusion-generated motions into executable trajectories. While WBC trajectories become compliant with physics, they may expose substantial deviations from original motion. To address this issue, we here propose PhysMoDPO, a Direct Preference Optimization framework. Unlike prior work that relies on hand-crafted physics-aware heuristics such as foot-sliding penalties, we integrate WBC into our training pipeline and optimize diffusion model such that the output of WBC becomes compliant both with physics and original text instructions. To train PhysMoDPO we deploy physics-based and task-specific rewards and use them to assign preference to synthesized trajectories. Our extensive experiments on text-to-motion and spatial control tasks demonstrate consistent improvements of PhysMoDPO in both physical realism and task-related metrics on simulated robots. Moreover, we demonstrate that PhysMoDPO results in significant improvements when applied to zero-shot motion transfer in simulation and for real-world deployment on a G1 humanoid robot.

PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization

Abstract

Recent progress in text-conditioned human motion generation has been largely driven by diffusion models trained on large-scale human motion data. Building on this progress, recent methods attempt to transfer such models for character animation and real robot control by applying a Whole-Body Controller (WBC) that converts diffusion-generated motions into executable trajectories. While WBC trajectories become compliant with physics, they may expose substantial deviations from original motion. To address this issue, we here propose PhysMoDPO, a Direct Preference Optimization framework. Unlike prior work that relies on hand-crafted physics-aware heuristics such as foot-sliding penalties, we integrate WBC into our training pipeline and optimize diffusion model such that the output of WBC becomes compliant both with physics and original text instructions. To train PhysMoDPO we deploy physics-based and task-specific rewards and use them to assign preference to synthesized trajectories. Our extensive experiments on text-to-motion and spatial control tasks demonstrate consistent improvements of PhysMoDPO in both physical realism and task-related metrics on simulated robots. Moreover, we demonstrate that PhysMoDPO results in significant improvements when applied to zero-shot motion transfer in simulation and for real-world deployment on a G1 humanoid robot.
Paper Structure (16 sections, 11 equations, 7 figures, 10 tables)

This paper contains 16 sections, 11 equations, 7 figures, 10 tables.

Figures (7)

  • Figure 1: PhysMoDPO generates motions that follow textual instructions while respecting physical constraints. Compared to prior methods, our approach produces motions that remain stable and physically realistic when deployed on the Unitree G1 robot.
  • Figure 2: Overview of PhysMoDPO. Given a conditioning signal (text and optional joint controls), we sample multiple motions $X$ from a pretrained generator. A fixed physics-based tracking policy then projects each sample into a simulated trajectory $X'$. We compute physics rewards and task rewards on $X'$, construct preference pairs, and finetune the generator with DPO. This generation--finetuning procedure can be iterated.
  • Figure 3: Visual comparison with SMPL simulation. On top, we compare MaskedMimic tessler2024maskedmimic, MotionStreamer xiao2025motionstreamer and PhysMoDPO on text-to-motion generation task on HumanML3D guo2022generating dataset. At the bottom, we show visual results of spatial-text control task on HumanML3D guo2022generating (left) and OMOMO li2023object (right) dataset. Red balls are the input spatial control signals and red boxes highlight the samples which do not follow the control or lose balance.
  • Figure 4: Visual results on Unitree G1 robot. Our deployed motion model enables the robot to move in a physically-realistic way while following input text instructions.
  • Figure 5: User study. Comparison of real-robot motion sequences generated by PhysMoDPO, MaskedMimic tessler2024maskedmimic and OmniControl xie2023omnicontrol. PhysMoDPO outperform both competitors in terms of text adherence, motion smoothness and overall stability.
  • ...and 2 more figures