Table of Contents
Fetching ...

PHUMA: Physically-Grounded Humanoid Locomotion Dataset

Kyungmin Lee, Sibeen Kim, Minho Park, Hyunseung Kim, Dongyoon Hwang, Hojoon Lee, Jaegul Choo

TL;DR

PHUMA addresses the scarcity and physical artifacts in humanoid motion data by combining large-scale human video with physics-aware curation and a physics-constrained retargeting method, PhySINK, to produce physically plausible humanoid motions. The two-stage pipeline yields 73 hours of data across 76 thousand clips, enabling policies that outperform AMASS and Humanoid-X in both full-motion imitation and pelvis-only path following on Unitree G1 and H1-2. Through MaskedMimic-based PPO training, PHUMA-trained policies show improved success rates on unseen motions and in precise path-following, demonstrating the value of physically grounded data for scalable humanoid control.

Abstract

Motion imitation is a promising approach for humanoid locomotion, enabling agents to acquire humanlike behaviors. Existing methods typically rely on high-quality motion capture datasets such as AMASS, but these are scarce and expensive, limiting scalability and diversity. Recent studies attempt to scale data collection by converting large-scale internet videos, exemplified by Humanoid-X. However, they often introduce physical artifacts such as floating, penetration, and foot skating, which hinder stable imitation. In response, we introduce PHUMA, a Physically-grounded HUMAnoid locomotion dataset that leverages human video at scale, while addressing physical artifacts through careful data curation and physics-constrained retargeting. PHUMA enforces joint limits, ensures ground contact, and eliminates foot skating, producing motions that are both large-scale and physically reliable. We evaluated PHUMA in two sets of conditions: (i) imitation of unseen motion from self-recorded test videos and (ii) path following with pelvis-only guidance. In both cases, PHUMA-trained policies outperform Humanoid-X and AMASS, achieving significant gains in imitating diverse motions. The code is available at https://davian-robotics.github.io/PHUMA.

PHUMA: Physically-Grounded Humanoid Locomotion Dataset

TL;DR

PHUMA addresses the scarcity and physical artifacts in humanoid motion data by combining large-scale human video with physics-aware curation and a physics-constrained retargeting method, PhySINK, to produce physically plausible humanoid motions. The two-stage pipeline yields 73 hours of data across 76 thousand clips, enabling policies that outperform AMASS and Humanoid-X in both full-motion imitation and pelvis-only path following on Unitree G1 and H1-2. Through MaskedMimic-based PPO training, PHUMA-trained policies show improved success rates on unseen motions and in precise path-following, demonstrating the value of physically grounded data for scalable humanoid control.

Abstract

Motion imitation is a promising approach for humanoid locomotion, enabling agents to acquire humanlike behaviors. Existing methods typically rely on high-quality motion capture datasets such as AMASS, but these are scarce and expensive, limiting scalability and diversity. Recent studies attempt to scale data collection by converting large-scale internet videos, exemplified by Humanoid-X. However, they often introduce physical artifacts such as floating, penetration, and foot skating, which hinder stable imitation. In response, we introduce PHUMA, a Physically-grounded HUMAnoid locomotion dataset that leverages human video at scale, while addressing physical artifacts through careful data curation and physics-constrained retargeting. PHUMA enforces joint limits, ensures ground contact, and eliminates foot skating, producing motions that are both large-scale and physically reliable. We evaluated PHUMA in two sets of conditions: (i) imitation of unseen motion from self-recorded test videos and (ii) path following with pelvis-only guidance. In both cases, PHUMA-trained policies outperform Humanoid-X and AMASS, achieving significant gains in imitating diverse motions. The code is available at https://davian-robotics.github.io/PHUMA.

Paper Structure

This paper contains 27 sections, 9 equations, 9 figures, 11 tables.

Figures (9)

  • Figure 1: Physical reliability of Humanoid-X vs. PHUMA. Each column illustrates four failure modes: joint violation, floating, penetration, and skating. Humanoid-X mao2025humanoidx (top row) often exhibits these issues due to direct video-to-motion conversion, while PHUMA (bottom row) mitigates those violations through careful data curation and physically grounded retargeting.
  • Figure 2: Overview of datasets and performance. PHUMA is both large-scale and physically reliable, which translates into higher success rates in motion imitation and pelvis path following. (a) Feasible and infeasible human motion sources in each dataset. (b) Physical reliability, with AMASS retargeted using a standard learning-based inverse kinematics method. (c) Success rate on unseen motions. (d) Success rate in path-following. Results are reported on the Unitree G1 humanoid.
  • Figure 3: Overview of the PHUMA pipeline. Our four-stage pipeline for motion imitation learning includes: (1) Motion Curation, where we filter out problematic motions from a diverse dataset; (2) Motion Retargeting, where the filtered motions are retargeted to the humanoid using PhySINK, incorporating a series of losses.; (3) Policy Learning, where a policy is trained to imitate the retargeted motions; and (4) Inference, where the trained policy is used to control the humanoid, enabling it to imitate motions from unseen videos processed by a video-to-motion model.
  • Figure 4: Common physical artifacts in motion retargeting. From left to right: Motion Mismatch, Joint Violation, Floating, Penetration, and Skating.
  • Figure 5: Path following on running motion. We visualize the robot's trajectory in a running motion. The target pelvis path is visualized with a green line. Top row presents results from a policy trained on AMASS, while bottom row presents results from a policy trained on PHUMA.
  • ...and 4 more figures