Table of Contents
Fetching ...

APEX: Learning Adaptive High-Platform Traversal for Humanoid Robots

Yikai Wang, Tingxuan Leng, Changyi Lin, Shiqi Liu, Shir Simon, Bingqing Chen, Jonathan Francis, Ding Zhao

TL;DR

APEX presents a capable approach to humanoid high-platform traversal by learning six context-conditioned skills and distilling them into a single, perceptive policy. A key innovation is the generalized ratchet progress reward, which provides dense, velocity-free supervision by maintaining a best-so-far task state and penalizing non-improving steps, enabling safe exploration of contact-rich maneuvers. The method combines LiDAR-based elevation mapping with sim-to-real strategies (artifact modeling during training and map inpainting during deployment) to bridge perception gaps, and uses a two-stage distillation (teacher→student) to fuse diverse skills into one controller that autonomously selects behaviors and transitions. Hardware experiments demonstrate zero-shot sim-to-real traversal of $0.8$ m platforms (≈$114\%$ of leg length) with robust adaptation to height and pose, highlighting the approach’s potential for deploying safe, adaptive climbing-like behaviors in real-world humanoid robots.

Abstract

Humanoid locomotion has advanced rapidly with deep reinforcement learning (DRL), enabling robust feet-based traversal over uneven terrain. Yet platforms beyond leg length remain largely out of reach because current RL training paradigms often converge to jumping-like solutions that are high-impact, torque-limited, and unsafe for real-world deployment. To address this gap, we propose APEX, a system for perceptive, climbing-based high-platform traversal that composes terrain-conditioned behaviors: climb-up and climb-down at vertical edges, walking or crawling on the platform, and stand-up and lie-down for posture reconfiguration. Central to our approach is a generalized ratchet progress reward for learning contact-rich, goal-reaching maneuvers. It tracks the best-so-far task progress and penalizes non-improving steps, providing dense yet velocity-free supervision that enables efficient exploration under strong safety regularization. Based on this formulation, we train LiDAR-based full-body maneuver policies and reduce the sim-to-real perception gap through a dual strategy: modeling mapping artifacts during training and applying filtering and inpainting to elevation maps during deployment. Finally, we distill all six skills into a single policy that autonomously selects behaviors and transitions based on local geometry and commands. Experiments on a 29-DoF Unitree G1 humanoid demonstrate zero-shot sim-to-real traversal of 0.8 meter platforms (approximately 114% of leg length), with robust adaptation to platform height and initial pose, as well as smooth and stable multi-skill transitions.

APEX: Learning Adaptive High-Platform Traversal for Humanoid Robots

TL;DR

APEX presents a capable approach to humanoid high-platform traversal by learning six context-conditioned skills and distilling them into a single, perceptive policy. A key innovation is the generalized ratchet progress reward, which provides dense, velocity-free supervision by maintaining a best-so-far task state and penalizing non-improving steps, enabling safe exploration of contact-rich maneuvers. The method combines LiDAR-based elevation mapping with sim-to-real strategies (artifact modeling during training and map inpainting during deployment) to bridge perception gaps, and uses a two-stage distillation (teacher→student) to fuse diverse skills into one controller that autonomously selects behaviors and transitions. Hardware experiments demonstrate zero-shot sim-to-real traversal of m platforms (≈ of leg length) with robust adaptation to height and pose, highlighting the approach’s potential for deploying safe, adaptive climbing-like behaviors in real-world humanoid robots.

Abstract

Humanoid locomotion has advanced rapidly with deep reinforcement learning (DRL), enabling robust feet-based traversal over uneven terrain. Yet platforms beyond leg length remain largely out of reach because current RL training paradigms often converge to jumping-like solutions that are high-impact, torque-limited, and unsafe for real-world deployment. To address this gap, we propose APEX, a system for perceptive, climbing-based high-platform traversal that composes terrain-conditioned behaviors: climb-up and climb-down at vertical edges, walking or crawling on the platform, and stand-up and lie-down for posture reconfiguration. Central to our approach is a generalized ratchet progress reward for learning contact-rich, goal-reaching maneuvers. It tracks the best-so-far task progress and penalizes non-improving steps, providing dense yet velocity-free supervision that enables efficient exploration under strong safety regularization. Based on this formulation, we train LiDAR-based full-body maneuver policies and reduce the sim-to-real perception gap through a dual strategy: modeling mapping artifacts during training and applying filtering and inpainting to elevation maps during deployment. Finally, we distill all six skills into a single policy that autonomously selects behaviors and transitions based on local geometry and commands. Experiments on a 29-DoF Unitree G1 humanoid demonstrate zero-shot sim-to-real traversal of 0.8 meter platforms (approximately 114% of leg length), with robust adaptation to platform height and initial pose, as well as smooth and stable multi-skill transitions.
Paper Structure (35 sections, 6 equations, 13 figures, 12 tables)

This paper contains 35 sections, 6 equations, 13 figures, 12 tables.

Figures (13)

  • Figure 1: The robot adaptively traverses high platforms of up to $0.8\,\text{m}$ ($\approx 114\%$ of leg length) by leveraging diverse full-body behaviors, including climb-up, climb-down, stand-up, lie-down. Enabled by LiDAR-based elevation mapping, the policy exhibits context-aware whole-body coordination, allowing continuous and robust traversal across challenging terrain (https://apex-humanoid.github.io).
  • Figure 2: Learning pipeline for high-platform traversal: Teacher Training uses RL with the Ratchet Progress Reward, where a "best-so-far" task-space reference ensures genuine advancement by only rewarding states that strictly surpass historical progress. These skills are unified into a single context-aware policy through Distillation, using a "divide-and-conquer" Data Sampling Rule across distributed environments to cover the full distribution of maneuvers and transitions. In Deployment, the humanoid robot performs end-to-end traversal using LiDAR-based elevation mapping for terrain adaptation.
  • Figure 3: The sim-to-real gap in LiDAR mapping is addressed through a dual approach that combines artifact modeling in simulation with real-world post-processing.
  • Figure 4: The robot regains balance and climbs up the high platform after being heavily kicked.
  • Figure 5: Real-world adaptation of the climb-up policy to varying platform heights (0.6–0.8,m) and approach angles ($\theta \in [-65^\circ, 65^\circ]$). The policy exhibits coordinated whole-body behaviors and reliable zero-shot sim-to-real transfer, even in extreme out-of-distribution cases.
  • ...and 8 more figures