Table of Contents
Fetching ...

BeamDojo: Learning Agile Humanoid Locomotion on Sparse Footholds

Huayi Wang, Zirui Wang, Junli Ren, Qingwei Ben, Tao Huang, Weinan Zhang, Jiangmiao Pang

TL;DR

BeamDojo presents a novel two-stage reinforcement learning framework for humanoid locomotion on sparse footholds, addressing the challenges of polygonal feet and sparse foothold rewards with a sampling-based foothold reward and a double-critic architecture. The method couples Stage 1 soft-terrain exploration on flat proxies with Stage 2 hard-terrain fine-tuning, guided by perceptual terrain information from a LiDAR-based elevation map and reinforced by a terrain-aware curriculum. Empirical results in simulation and on a Unitree G1 demonstrate high success rates, precise foothold placement, and robust performance under disturbances, with strong sim-to-real transfer aided by domain randomization. The work also analyzes design choices for foothold rewards, curricula, and perception strategies, highlighting the importance of elevation-map-based perception and two-stage training for generalization to non-flat terrains and real-world variability.

Abstract

Traversing risky terrains with sparse footholds poses a significant challenge for humanoid robots, requiring precise foot placements and stable locomotion. Existing learning-based approaches often struggle on such complex terrains due to sparse foothold rewards and inefficient learning processes. To address these challenges, we introduce BeamDojo, a reinforcement learning (RL) framework designed for enabling agile humanoid locomotion on sparse footholds. BeamDojo begins by introducing a sampling-based foothold reward tailored for polygonal feet, along with a double critic to balancing the learning process between dense locomotion rewards and sparse foothold rewards. To encourage sufficient trial-and-error exploration, BeamDojo incorporates a two-stage RL approach: the first stage relaxes the terrain dynamics by training the humanoid on flat terrain while providing it with task-terrain perceptive observations, and the second stage fine-tunes the policy on the actual task terrain. Moreover, we implement a onboard LiDAR-based elevation map to enable real-world deployment. Extensive simulation and real-world experiments demonstrate that BeamDojo achieves efficient learning in simulation and enables agile locomotion with precise foot placement on sparse footholds in the real world, maintaining a high success rate even under significant external disturbances.

BeamDojo: Learning Agile Humanoid Locomotion on Sparse Footholds

TL;DR

BeamDojo presents a novel two-stage reinforcement learning framework for humanoid locomotion on sparse footholds, addressing the challenges of polygonal feet and sparse foothold rewards with a sampling-based foothold reward and a double-critic architecture. The method couples Stage 1 soft-terrain exploration on flat proxies with Stage 2 hard-terrain fine-tuning, guided by perceptual terrain information from a LiDAR-based elevation map and reinforced by a terrain-aware curriculum. Empirical results in simulation and on a Unitree G1 demonstrate high success rates, precise foothold placement, and robust performance under disturbances, with strong sim-to-real transfer aided by domain randomization. The work also analyzes design choices for foothold rewards, curricula, and perception strategies, highlighting the importance of elevation-map-based perception and two-stage training for generalization to non-flat terrains and real-world variability.

Abstract

Traversing risky terrains with sparse footholds poses a significant challenge for humanoid robots, requiring precise foot placements and stable locomotion. Existing learning-based approaches often struggle on such complex terrains due to sparse foothold rewards and inefficient learning processes. To address these challenges, we introduce BeamDojo, a reinforcement learning (RL) framework designed for enabling agile humanoid locomotion on sparse footholds. BeamDojo begins by introducing a sampling-based foothold reward tailored for polygonal feet, along with a double critic to balancing the learning process between dense locomotion rewards and sparse foothold rewards. To encourage sufficient trial-and-error exploration, BeamDojo incorporates a two-stage RL approach: the first stage relaxes the terrain dynamics by training the humanoid on flat terrain while providing it with task-terrain perceptive observations, and the second stage fine-tunes the policy on the actual task terrain. Moreover, we implement a onboard LiDAR-based elevation map to enable real-world deployment. Extensive simulation and real-world experiments demonstrate that BeamDojo achieves efficient learning in simulation and enables agile locomotion with precise foot placement on sparse footholds in the real world, maintaining a high success rate even under significant external disturbances.

Paper Structure

This paper contains 39 sections, 8 equations, 10 figures, 10 tables.

Figures (10)

  • Figure 1: Foothold Reward. We sample $n$ points under the foot. Green points indicate contact with the surface within the safe region, while red points represent those not in contact with the surface.
  • Figure 2: Overview of BeamDojo. (a) Training in Simulation: In stage 1, proprioceptive and perceptive information, locomotion rewards and the foothold reward are decoupled respectively, with the former obtained from flat terrain and the latter from task terrain. The double critic module separately learns two reward groups. In stage 2, the policy is fine-tuned on the task terrain, utilizing the full set of observations and rewards. (b) Real-world deployment: The robot-centric elevation map, reconstructed using LiDAR data, is combined with proprioceptive information to serve as the input for the actor.
  • Figure 3: Terrain Setting in Simulation. (a) is used for stage 1 training, while (b) and (c) are used for stage 2 training. The training terrain progression is listed from simple to difficult. (b)-(e) are used for evaluation.
  • Figure 4: Foothold Error Comparison. The foothold error benchmarks of all methods are evaluated in (a) stepping stones and (b) balancing beams, both tested under medium terrain difficulty.
  • Figure 5: Learning Efficiency. The learning curves show the maximum terrain levels achieved in two training stages of all methods. Faster attainment of terrain level 8 indicates more efficient learning.
  • ...and 5 more figures