Table of Contents
Fetching ...

LIPM-Guided Reinforcement Learning for Stable and Perceptive Locomotion in Bipedal Robots

Haokai Su, Haoxiang Luo, Shunpeng Yang, Kaiwen Jiang, Wei Zhang, Hua Chen

TL;DR

The paper tackles stable perceptive locomotion for bipedal robots in unstructured outdoor environments by integrating LIPM theory with reinforcement learning within a vision-enabled Concurrent Teacher-Student framework. It introduces a LIPM-guided reward design, a Reward Fusion Module to prioritize stability over velocity tracking, and a double-critic architecture to separately optimize stability and locomotion, enabling robust learning. CoM trajectory generation leverages a constraint plane with variable height, yielding equations such as $ \boldsymbol{p}_{com} = \frac{z}{g} \ddot{\boldsymbol{p}}_{com} + \boldsymbol{p}_{ZMP}$ and $ \hat{\boldsymbol{p}}_{com} = \boldsymbol{p}_{ZMP} + \frac{z}{g} k_p (\boldsymbol{v}^{cmd}_{xy} - \boldsymbol{v}_{xy})$, with $z_c$ fixed to upright height and zero angular momentum. Extensive simulation and real-world outdoor experiments show improved terrain adaptability, disturbance rejection, and robust perception-enabled locomotion across speeds and perceptual conditions, validating the approach’s practical impact. The work effectively bridges model-based stability insights with learning-based perception, and points toward integrating more expressive dynamics models such as SLIP or VHIP for even broader agile behavior in unstructured terrains.

Abstract

Achieving stable and robust perceptive locomotion for bipedal robots in unstructured outdoor environments remains a critical challenge due to complex terrain geometry and susceptibility to external disturbances. In this work, we propose a novel reward design inspired by the Linear Inverted Pendulum Model (LIPM) to enable perceptive and stable locomotion in the wild. The LIPM provides theoretical guidance for dynamic balance by regulating the center of mass (CoM) height and the torso orientation. These are key factors for terrain-aware locomotion, as they help ensure a stable viewpoint for the robot's camera. Building on this insight, we design a reward function that promotes balance and dynamic stability while encouraging accurate CoM trajectory tracking. To adaptively trade off between velocity tracking and stability, we leverage the Reward Fusion Module (RFM) approach that prioritizes stability when needed. A double-critic architecture is adopted to separately evaluate stability and locomotion objectives, improving training efficiency and robustness. We validate our approach through extensive experiments on a bipedal robot in both simulation and real-world outdoor environments. The results demonstrate superior terrain adaptability, disturbance rejection, and consistent performance across a wide range of speeds and perceptual conditions.

LIPM-Guided Reinforcement Learning for Stable and Perceptive Locomotion in Bipedal Robots

TL;DR

The paper tackles stable perceptive locomotion for bipedal robots in unstructured outdoor environments by integrating LIPM theory with reinforcement learning within a vision-enabled Concurrent Teacher-Student framework. It introduces a LIPM-guided reward design, a Reward Fusion Module to prioritize stability over velocity tracking, and a double-critic architecture to separately optimize stability and locomotion, enabling robust learning. CoM trajectory generation leverages a constraint plane with variable height, yielding equations such as and , with fixed to upright height and zero angular momentum. Extensive simulation and real-world outdoor experiments show improved terrain adaptability, disturbance rejection, and robust perception-enabled locomotion across speeds and perceptual conditions, validating the approach’s practical impact. The work effectively bridges model-based stability insights with learning-based perception, and points toward integrating more expressive dynamics models such as SLIP or VHIP for even broader agile behavior in unstructured terrains.

Abstract

Achieving stable and robust perceptive locomotion for bipedal robots in unstructured outdoor environments remains a critical challenge due to complex terrain geometry and susceptibility to external disturbances. In this work, we propose a novel reward design inspired by the Linear Inverted Pendulum Model (LIPM) to enable perceptive and stable locomotion in the wild. The LIPM provides theoretical guidance for dynamic balance by regulating the center of mass (CoM) height and the torso orientation. These are key factors for terrain-aware locomotion, as they help ensure a stable viewpoint for the robot's camera. Building on this insight, we design a reward function that promotes balance and dynamic stability while encouraging accurate CoM trajectory tracking. To adaptively trade off between velocity tracking and stability, we leverage the Reward Fusion Module (RFM) approach that prioritizes stability when needed. A double-critic architecture is adopted to separately evaluate stability and locomotion objectives, improving training efficiency and robustness. We validate our approach through extensive experiments on a bipedal robot in both simulation and real-world outdoor environments. The results demonstrate superior terrain adaptability, disturbance rejection, and consistent performance across a wide range of speeds and perceptual conditions.

Paper Structure

This paper contains 18 sections, 10 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Demonstration of stable locomotion across diverse and challenging terrains. Our point-foot bipedal robot successfully traverses various outdoor environments—including grass, gravel, slopes, curbs, stairs, and elevated platforms—while maintaining stability without external support or safety tethers.
  • Figure 2: A cross-sectional view of the LIPM model in the $zx$-plane for a point-foot bipedal locomotion system. The motion constraint plane in 3D is reduced to a constraint line in this 2D representation.
  • Figure 3: Overview of the Vision-CTS learning framework. Agents are divided into teacher and student groups according to their access to observation modalities. The blue dashed lines indicate PPO gradient schulman2017proximal flow, while the red dashed lines denote supervised learning signals.
  • Figure 4: Terrains used in the simulation evaluation. The slopes have gradients up to $32.10^{\circ}$. The rough terrain contains uniform noise with heights ranging from 2 cm to 15 cm. The stairs have a height of 20 cm. The discrete obstacles vary in height from 2 cm to 36 cm.
  • Figure 5: Average Success Rate by velocity dimension for each policy.
  • ...and 2 more figures