Walking with Terrain Reconstruction: Learning to Traverse Risky Sparse Footholds

Ruiqi Yu; Qianshi Wang; Yizhen Wang; Zhicheng Wang; Jun Wu; Qiuguo Zhu

Walking with Terrain Reconstruction: Learning to Traverse Risky Sparse Footholds

Ruiqi Yu, Qianshi Wang, Yizhen Wang, Zhicheng Wang, Jun Wu, Qiuguo Zhu

TL;DR

It is demonstrated that end-to-end reinforcement learning relying solely on proprioception and depth images is capable of traversing risky terrains with high sparsity and randomness and is deployed on a low-cost quadrupedal robot.

Abstract

Traversing risky terrains with sparse footholds presents significant challenges for legged robots, requiring precise foot placement in safe areas. To acquire comprehensive exteroceptive information, prior studies have employed motion capture systems or mapping techniques to generate heightmap for locomotion policy. However, these approaches require specialized pipelines and often introduce additional noise. While depth images from egocentric vision systems are cost-effective, their limited field of view and sparse information hinder the integration of terrain structure details into implicit features, which are essential for generating precise actions. In this paper, we demonstrate that end-to-end reinforcement learning relying solely on proprioception and depth images is capable of traversing risky terrains with high sparsity and randomness. Our method introduces local terrain reconstruction, leveraging the benefits of clear features and sufficient information from the heightmap, which serves as an intermediary for visual feature extraction and motion generation. This allows the policy to effectively represent and memorize critical terrain information. We deploy the proposed framework on a low-cost quadrupedal robot, achieving agile and adaptive locomotion across various challenging terrains and showcasing outstanding performance in real-world scenarios. Video at: youtu.be/Rj9v5EZsn-M.

Walking with Terrain Reconstruction: Learning to Traverse Risky Sparse Footholds

TL;DR

Abstract

Paper Structure (24 sections, 4 equations, 8 figures, 5 tables)

This paper contains 24 sections, 4 equations, 8 figures, 5 tables.

INTRODUCTION
METHOD
Terrain-Aware Locomotion Policy
Observations and Action
Rewards
I-E Estimator
Terrain Reconstructor
Adaptive Sampling
Training Environment
Terrain and Terrain Progressive Curriculum
Command Sampling
EXPIREMENT
Experimental Setup
Robot and Simulation
Ablation Studies
...and 9 more sections

Figures (8)

Figure 1: The proposed framework employs local terrain reconstruction to extract comprehensive and detailed terrain information from limited visual perception, significantly enhancing the policy's environmental understanding. (A) Using end-to-end RL, it enables agile locomotion across risky terrains with sparse footholds, such as stepping stones, balance beams, stepping beams and gaps, showcasing remarkable flexibility and adaptability in the real world. (B) Reconstructed local heightmap in the real world and simulation.
Figure 2: Overview of the proposed framework. The end-to-end framework consists of two key modules: a locomotion policy (gray) and a terrain reconstructor (pink). The terrain reconstructor takes depth images as input to generate local terrain heightmap reconstruction $\hat{H}^{\text{map}}_t$ while the locomotion policy performs implicit-explicit estimation and subsequently generates actions. All components are jointly optimized by PPO and supervised learning within the same training stage.
Figure 3: The terrain reconstructor reconstructs and refines the local terrain heightmap, including regions beneath and behind the robot, from proprioception features and depth images. During training, we employ AdaSmpl to reduce the learning difficulty of locomotion policy in the early stages.
Figure 4: The robots learn basic locomotion skills on the lower-randomness stepping stones (light green) at start, and are progressively transitioned to more challenging terrains with sparse footholds (light blue).
Figure 5: Ablation studies in simulation. The left and middle columns record the success rate (line) and traversal rate (bar) for training pipeline design and model architecture design. The x-axis represents terrain difficulty, defined by sparsity (with maximum randomization) for stepping stones and stepping beams, and by beam and gap width for balance beams and gaps. The right column shows the mean edge violations (MEV) of each policy at the highest difficulty level.
...and 3 more figures

Walking with Terrain Reconstruction: Learning to Traverse Risky Sparse Footholds

TL;DR

Abstract

Walking with Terrain Reconstruction: Learning to Traverse Risky Sparse Footholds

Authors

TL;DR

Abstract

Table of Contents

Figures (8)