Table of Contents
Fetching ...

Learning Agile Locomotion on Risky Terrains

Chong Zhang, Nikita Rudin, David Hoeller, Marco Hutter

TL;DR

This work tackles the challenge of agile quadrupedal locomotion on risky terrains with sparse footholds by recasting locomotion as a navigation task and employing end-to-end RL. It introduces a three-pronged exploration strategy (curriculum with relaxed progression, intrinsic curiosity, and symmetry-based augmentation) and a two-stage generalist-specialist training pipeline, enabling reusable sensorimotor skills across terrains. Simulation and real-world experiments on an ANYmal-D robot show peak speeds of at least $2.5\ \,\mathrm{m/s}$ on stepping stones and beams, with successful sim-to-real transfer for two terrain types. The results demonstrate robust, diverse locomotion on challenging terrains, while acknowledging limitations in unified policy learning and reward design, guiding future work toward onboard perception and interpretable, unified policies.

Abstract

Quadruped robots have shown remarkable mobility on various terrains through reinforcement learning. Yet, in the presence of sparse footholds and risky terrains such as stepping stones and balance beams, which require precise foot placement to avoid falls, model-based approaches are often used. In this paper, we show that end-to-end reinforcement learning can also enable the robot to traverse risky terrains with dynamic motions. To this end, our approach involves training a generalist policy for agile locomotion on disorderly and sparse stepping stones before transferring its reusable knowledge to various more challenging terrains by finetuning specialist policies from it. Given that the robot needs to rapidly adapt its velocity on these terrains, we formulate the task as a navigation task instead of the commonly used velocity tracking which constrains the robot's behavior and propose an exploration strategy to overcome sparse rewards and achieve high robustness. We validate our proposed method through simulation and real-world experiments on an ANYmal-D robot achieving peak forward velocity of >= 2.5 m/s on sparse stepping stones and narrow balance beams. Video: youtu.be/Z5X0J8OH6z4

Learning Agile Locomotion on Risky Terrains

TL;DR

This work tackles the challenge of agile quadrupedal locomotion on risky terrains with sparse footholds by recasting locomotion as a navigation task and employing end-to-end RL. It introduces a three-pronged exploration strategy (curriculum with relaxed progression, intrinsic curiosity, and symmetry-based augmentation) and a two-stage generalist-specialist training pipeline, enabling reusable sensorimotor skills across terrains. Simulation and real-world experiments on an ANYmal-D robot show peak speeds of at least on stepping stones and beams, with successful sim-to-real transfer for two terrain types. The results demonstrate robust, diverse locomotion on challenging terrains, while acknowledging limitations in unified policy learning and reward design, guiding future work toward onboard perception and interpretable, unified policies.

Abstract

Quadruped robots have shown remarkable mobility on various terrains through reinforcement learning. Yet, in the presence of sparse footholds and risky terrains such as stepping stones and balance beams, which require precise foot placement to avoid falls, model-based approaches are often used. In this paper, we show that end-to-end reinforcement learning can also enable the robot to traverse risky terrains with dynamic motions. To this end, our approach involves training a generalist policy for agile locomotion on disorderly and sparse stepping stones before transferring its reusable knowledge to various more challenging terrains by finetuning specialist policies from it. Given that the robot needs to rapidly adapt its velocity on these terrains, we formulate the task as a navigation task instead of the commonly used velocity tracking which constrains the robot's behavior and propose an exploration strategy to overcome sparse rewards and achieve high robustness. We validate our proposed method through simulation and real-world experiments on an ANYmal-D robot achieving peak forward velocity of >= 2.5 m/s on sparse stepping stones and narrow balance beams. Video: youtu.be/Z5X0J8OH6z4
Paper Structure (28 sections, 2 equations, 11 figures, 5 tables)

This paper contains 28 sections, 2 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: The ANYmal-D robot executing the learned RL policies on stepping stones, balance beams with gaps, and stepping beams. Left top: the stepping stones were $20$ cm wide taking up only $\sim 10\%$ of the whole area, and the height difference between high and low stones was $12$ cm. The robot reached the peak forward velocity of $2.7$ m/s when traversing the stones from left to right. Right top: the terrain consisted of two beams of $100$ cm $\times$$20$ cm and three gaps. The peak forward velocity of the robot was $2.5$ m/s from right to left. Bottom: The widths of stepping beams varied in $[12$ cm$, 17$ cm$]$, the horizontal distances between two beams varied in $[30$ cm$, 60$ cm$]$, and the vertical distances varied in $[0$ cm$, 20$ cm$]$. For practical reasons, we only tested the policy in simulation, and the peak forward velocity of the robot reached $1.5$ m/s from left to right.
  • Figure 2: An overview of our methodology. The learning system is illustrated on the left. Different components are listed on the right.
  • Figure 3: Different types of terrains with difficulty levels 0, 5, and 9 from left to right. One terrain type can have different subtypes that stress different features, as is explained in Sec. \ref{['subsec:terrains']}.
  • Figure 4: The learned infeasible crawling motions on balance beams due to inappropriate terrain curriculum design. The terrain here for display is far less challenging than the terrains in the real world, but the robot cannot traverse it. When the beam is narrow, the robot cannot find its next step on the beam.
  • Figure 5: The robot walking around on the hardest "Stones-Everywhere" terrain in simulation. The yellow boxes indicate a series of target positions and headings we provided to the robot to achieve omnidirectional locomotion. The robot had different gait patterns under different situations, e.g., the 3-contact walk or 2-contact trot gaits when easy to find safe footholds ahead, and the 2-contact pace gaits otherwise. Feet in contact with the terrain are marked.
  • ...and 6 more figures