Table of Contents
Fetching ...

Humanoid Whole-Body Locomotion on Narrow Terrain via Dynamic Balance and Reinforcement Learning

Weiji Xie, Chenjia Bai, Jiyuan Shi, Junkai Yang, Yunfei Ge, Weinan Zhang, Xuelong Li

TL;DR

Humanoid robots face challenges maintaining dynamic balance on extreme terrains without external perception. The authors propose DBHL, a proprioception-only RL framework that uses a ZMP-based ZML reward and an asymmetric actor-critic to achieve coordinated whole-body locomotion, augmented by angular momentum regularization, multiplicative action noise, reward vectorization, symmetry regularization, and domain randomization. Through extensive simulation and real-world experiments on a full-sized Unitree H1-2, DBHL demonstrates superior stability and adaptability on narrow paths, disturbances, and irregular obstacles, with successful sim-to-real transfer. The work advances perception-free robust locomotion for humanoids in challenging environments and provides a scalable framework for dynamic balance on complex terrain.

Abstract

Humans possess delicate dynamic balance mechanisms that enable them to maintain stability across diverse terrains and under extreme conditions. However, despite significant advances recently, existing locomotion algorithms for humanoid robots are still struggle to traverse extreme environments, especially in cases that lack external perception (e.g., vision or LiDAR). This is because current methods often rely on gait-based or perception-condition rewards, lacking effective mechanisms to handle unobservable obstacles and sudden balance loss. To address this challenge, we propose a novel whole-body locomotion algorithm based on dynamic balance and Reinforcement Learning (RL) that enables humanoid robots to traverse extreme terrains, particularly narrow pathways and unexpected obstacles, using only proprioception. Specifically, we introduce a dynamic balance mechanism by leveraging an extended measure of Zero-Moment Point (ZMP)-driven rewards and task-driven rewards in a whole-body actor-critic framework, aiming to achieve coordinated actions of the upper and lower limbs for robust locomotion. Experiments conducted on a full-sized Unitree H1-2 robot verify the ability of our method to maintain balance on extremely narrow terrains and under external disturbances, demonstrating its effectiveness in enhancing the robot's adaptability to complex environments. The videos are given at https://whole-body-loco.github.io.

Humanoid Whole-Body Locomotion on Narrow Terrain via Dynamic Balance and Reinforcement Learning

TL;DR

Humanoid robots face challenges maintaining dynamic balance on extreme terrains without external perception. The authors propose DBHL, a proprioception-only RL framework that uses a ZMP-based ZML reward and an asymmetric actor-critic to achieve coordinated whole-body locomotion, augmented by angular momentum regularization, multiplicative action noise, reward vectorization, symmetry regularization, and domain randomization. Through extensive simulation and real-world experiments on a full-sized Unitree H1-2, DBHL demonstrates superior stability and adaptability on narrow paths, disturbances, and irregular obstacles, with successful sim-to-real transfer. The work advances perception-free robust locomotion for humanoids in challenging environments and provides a scalable framework for dynamic balance on complex terrain.

Abstract

Humans possess delicate dynamic balance mechanisms that enable them to maintain stability across diverse terrains and under extreme conditions. However, despite significant advances recently, existing locomotion algorithms for humanoid robots are still struggle to traverse extreme environments, especially in cases that lack external perception (e.g., vision or LiDAR). This is because current methods often rely on gait-based or perception-condition rewards, lacking effective mechanisms to handle unobservable obstacles and sudden balance loss. To address this challenge, we propose a novel whole-body locomotion algorithm based on dynamic balance and Reinforcement Learning (RL) that enables humanoid robots to traverse extreme terrains, particularly narrow pathways and unexpected obstacles, using only proprioception. Specifically, we introduce a dynamic balance mechanism by leveraging an extended measure of Zero-Moment Point (ZMP)-driven rewards and task-driven rewards in a whole-body actor-critic framework, aiming to achieve coordinated actions of the upper and lower limbs for robust locomotion. Experiments conducted on a full-sized Unitree H1-2 robot verify the ability of our method to maintain balance on extremely narrow terrains and under external disturbances, demonstrating its effectiveness in enhancing the robot's adaptability to complex environments. The videos are given at https://whole-body-loco.github.io.

Paper Structure

This paper contains 27 sections, 12 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: The locomotion capabilities of full-sized Humanoid without vision or LiDAR sensors. (a) Narrow Path (25cm): The humanoid traverses a narrow pathway, including slopes and stairs, demonstrating precise foot placement and dynamic balance. (b) Unknown Obstacle: The humanoid robot showcases its dynamic balance control by swiftly adapting to the moving stick's attempts to trip it, maintaining stability even in this challenging scenario. (c) Carry Payload: Our method can maintain stability while carrying loads, highlighting its robust control. (d) Dense Conical Obstacles: The humanoid steps over a series of closely spaced cones, exhibiting agility and coordination. (e) External Pushes: The system responds to external forces applied during locomotion over uneven terrain, proving its resilience against disturbances. Each scenario underscores the DBHL's versatility and effectiveness in handling complex conditions.
  • Figure 2: The overall training process of the proposed method.
  • Figure 3: Illustration of ZMP-based reward in different locomotion conditions. The brown dot represents the approximated center of the support polygon, $\bm p_\text{csp}$, and the green dot is the projection of point $\bm p_\text{csp}$ onto the ZML in the horizontal plane.
  • Figure 4: Visualization of the various training terrains of our method in Isaac Gym.
  • Figure 5: Comparison of our method to baselines in various terrains and difficulties. The result shows that whole-body control is essential for DBHL, and the dynamic balance mechanism is more effective than phase-based gait in challenging conditions. Each setting is evaluated over 3 random seeds. The shaded region around each curve represents $\pm1\sigma$ range, indicating the variability of the results.
  • ...and 5 more figures