Reinforcement Learning for Blind Stair Climbing with Legged and Wheeled-Legged Robots

Simon Chamorro; Victor Klemm; Miguel de la Iglesia Valls; Christopher Pal; Roland Siegwart

Reinforcement Learning for Blind Stair Climbing with Legged and Wheeled-Legged Robots

Simon Chamorro, Victor Klemm, Miguel de la Iglesia Valls, Christopher Pal, Roland Siegwart

TL;DR

This work tackles the challenge of stair climbing for legged and wheeled-legged robots using a reinforcement-learning controller framed as a position-based task. It introduces an asymmetric actor-critic that exploits privileged information during training while operating solely on proprioceptive observations during deployment, augmented by a boolean terrain mode switch for stairs. The method demonstrates sim-to-real transfer, enabling Ascento to climb $15\,\mathrm{cm}$ stairs in the real world, and shows robust performance across multiple robot platforms. The contributions advance perception-free, versatile stair navigation for diverse legged systems and highlight the importance of curriculum training and domain randomization.

Abstract

In recent years, legged and wheeled-legged robots have gained prominence for tasks in environments predominantly created for humans across various domains. One significant challenge faced by many of these robots is their limited capability to navigate stairs, which hampers their functionality in multi-story environments. This study proposes a method aimed at addressing this limitation, employing reinforcement learning to develop a versatile controller applicable to a wide range of robots. In contrast to the conventional velocity-based controllers, our approach builds upon a position-based formulation of the RL task, which we show to be vital for stair climbing. Furthermore, the methodology leverages an asymmetric actor-critic structure, enabling the utilization of privileged information from simulated environments during training while eliminating the reliance on exteroceptive sensors during real-world deployment. Another key feature of the proposed approach is the incorporation of a boolean observation within the controller, enabling the activation or deactivation of a stair-climbing mode. We present our results on different quadrupeds and bipedal robots in simulation and showcase how our method allows the balancing robot Ascento to climb 15cm stairs in the real world, a task that was previously impossible for this robot.

Reinforcement Learning for Blind Stair Climbing with Legged and Wheeled-Legged Robots

TL;DR

stairs in the real world, and shows robust performance across multiple robot platforms. The contributions advance perception-free, versatile stair navigation for diverse legged systems and highlight the importance of curriculum training and domain randomization.

Abstract

Paper Structure (25 sections, 6 figures, 3 tables)

This paper contains 25 sections, 6 figures, 3 tables.

INTRODUCTION
Literature Review
Legged Locomotion using RL
Hybrid Wheeled-Legged Locomotion
Methodology
Learning Environment
Simulator
Learning Algorithm
Terrains
Robot Model
Task Formulation
State
Actions
Rewards
Curriculum
...and 10 more sections

Figures (6)

Figure 1: Proposed Method: Ascento Robot, Unitree Go1, Cassie and ANYmal on Wheels climbing steps.
Figure 2: System Overview during Training and Deployment: At every training step, the algorithm receives the observation and privileged information. The actor outputs an action for the next simulation step. During deployment, the actor receives only the observation and outputs an action for the robot to execute.
Figure 3: Different Training Terrains: First row left to right: Single step, staircase, smooth pyramid. Second row: rough pyramid, discrete obstacles, smooth slope.
Figure 4: Task setup: a) The goal pose is marked in green and red. The yellow dots represent the terrain height measurements which are part of the privileged information about the terrain. b) Training curriculum: Different training terrains, see section \ref{['sec:terrains']}. Progressively harder from left to right. The stairs terrains, where the terrain boolean is set to 1 during training, are delimited in green. In the other terrains, this observation is set to 0.
Figure 5: Step motion: Velocity profile of the step-up motion with our method.
...and 1 more figures

Reinforcement Learning for Blind Stair Climbing with Legged and Wheeled-Legged Robots

TL;DR

Abstract

Reinforcement Learning for Blind Stair Climbing with Legged and Wheeled-Legged Robots

Authors

TL;DR

Abstract

Table of Contents

Figures (6)