Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching

Zhen Wu; Xiaoyu Huang; Lujie Yang; Yuanhang Zhang; Koushil Sreenath; Xi Chen; Pieter Abbeel; Rocky Duan; Angjoo Kanazawa; Carmelo Sferrazza; Guanya Shi; C. Karen Liu

Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching

Zhen Wu, Xiaoyu Huang, Lujie Yang, Yuanhang Zhang, Koushil Sreenath, Xi Chen, Pieter Abbeel, Rocky Duan, Angjoo Kanazawa, Carmelo Sferrazza, Guanya Shi, C. Karen Liu

TL;DR

Perceptive Humanoid Parkour tackles agile parkour for humanoid robots by chaining dynamic human skills through motion matching and a teacher-student training pipeline. It retargets human motions into robot-compatible atomic skills, composes long-horizon references via motion matching, trains motion-tracking experts with privileged data, and distills them into a single depth-conditioned policy using a hybrid objective that combines imitation and reinforcement learning, $\mathcal{L}=\lambda_{\text{PPO}}\mathcal{L}_{\text{PPO}}+\lambda_{D}\mathcal{L}_{D}$. Real-world experiments on a Unitree G1 demonstrate climber-like maneuvers up to 1.25 m and continuous multi-obstacle traversal with online adaptation, evidencing effective sim-to-real transfer. The approach yields dense, varied motion references for robust timing and transitions, enabling autonomous perception-driven skill selection and execution. The work highlights the value of motion matching for long-horizon, expressive humanoid control and shows a scalable path from privileged expert policies to practical depth-based deployment, while noting current limits in semantic scene understanding and manipulation capabilities. Overall, PHP provides a scalable framework for perception-driven, highly dynamic humanoid parkour with strong real-world applicability.

Abstract

While recent advances in humanoid locomotion have achieved stable walking on varied terrains, capturing the agility and adaptivity of highly dynamic human motions remains an open challenge. In particular, agile parkour in complex environments demands not only low-level robustness, but also human-like motion expressiveness, long-horizon skill composition, and perception-driven decision-making. In this paper, we present Perceptive Humanoid Parkour (PHP), a modular framework that enables humanoid robots to autonomously perform long-horizon, vision-based parkour across challenging obstacle courses. Our approach first leverages motion matching, formulated as nearest-neighbor search in a feature space, to compose retargeted atomic human skills into long-horizon kinematic trajectories. This framework enables the flexible composition and smooth transition of complex skill chains while preserving the elegance and fluidity of dynamic human motions. Next, we train motion-tracking reinforcement learning (RL) expert policies for these composed motions, and distill them into a single depth-based, multi-skill student policy, using a combination of DAgger and RL. Crucially, the combination of perception and skill composition enables autonomous, context-aware decision-making: using only onboard depth sensing and a discrete 2D velocity command, the robot selects and executes whether to step over, climb onto, vault or roll off obstacles of varying geometries and heights. We validate our framework with extensive real-world experiments on a Unitree G1 humanoid robot, demonstrating highly dynamic parkour skills such as climbing tall obstacles up to 1.25m (96% robot height), as well as long-horizon multi-obstacle traversal with closed-loop adaptation to real-time obstacle perturbations.

Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching

TL;DR

. Real-world experiments on a Unitree G1 demonstrate climber-like maneuvers up to 1.25 m and continuous multi-obstacle traversal with online adaptation, evidencing effective sim-to-real transfer. The approach yields dense, varied motion references for robust timing and transitions, enabling autonomous perception-driven skill selection and execution. The work highlights the value of motion matching for long-horizon, expressive humanoid control and shows a scalable path from privileged expert policies to practical depth-based deployment, while noting current limits in semantic scene understanding and manipulation capabilities. Overall, PHP provides a scalable framework for perception-driven, highly dynamic humanoid parkour with strong real-world applicability.

Abstract

Paper Structure (33 sections, 5 equations, 5 figures, 6 tables)

This paper contains 33 sections, 5 equations, 5 figures, 6 tables.

Introduction
Related Works
Perceptive Terrain Traversal for Legged Robots
Humanoid Skill Chaining with Human Motion Data
Adaptive and Agile Long-Horizon Parkour
Overview
Skill Composition via Motion Matching
Basic motion matching
Long-Horizon Parkour Trajectory Synthesis
Learning a Highly-Dynamic Visuomotor Policy
Training Expert Policies with Motion Tracking
Distilling a Unified Student Policy with DAgger and RL
Experiments
Real-World Results
Human-Level Agility
...and 18 more sections

Figures (5)

Figure 1: Perceptive Humanoid Parkour (PHP) enables a Unitree G1 humanoid robot to execute highly dynamic, long-horizon parkour behaviors using onboard perception. By composing various agile human skills via motion matching and a teacher-student training pipeline, we train a single multi-skill visuomotor policy capable of complex contact-rich maneuvers including (a) cat-vaulting over a short obstacle followed by dash-vaulting over a higher obstacle at approximately 3 m/s, (b) climbing onto a 1.25 m (96% of robot height) wall, and rolling down, (c) speed-vaulting over an obstacle at approximately 3 m/s, and (d) a 60-second continuous traversal of a complex parkour course with autonomous skill selection and seamless transitions.
Figure 2: Perceptive Humanoid Parkour overview. Atomic parkour skills are composed into long-horizon kinematic reference trajectories via motion matching. Single-skill teacher policies are trained with privileged information using RL-based motion tracking. Multiple teachers are distilled into a single depth-based student policy using a hybrid DAgger and RL objective. This scalable recipe enables zero-shot sim-to-real transfer onto a physical humanoid robot that adaptively traverses through complex terrains by autonomously executing highly agile parkour skills using onboard perception.
Figure 3: Diverse variations of composed parkour skills synthesized via motion matching. (a) Different approach distances trigger varying stride phases and entry poses. (b) Diverse locomotion speeds, directions, and durations. (c) Randomized terrain poses and shapes.
Figure 4: Side-by-side comparison of high-climb agility. The robot climbs onto a 1.25 m wall within 3.63 s.
Figure 5: Hardware results demonstrating agile, long-horizon parkour behaviors, including (a) a cat vault, (b) a drop landing from a 1.25 m wall, and (c) a 48-second terrain traversal with online adaptation to real-time obstacle displacement.

Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching

TL;DR

Abstract

Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching

Authors

TL;DR

Abstract

Table of Contents

Figures (5)