Table of Contents
Fetching ...

Robust Recovery Controller for a Quadrupedal Robot using Deep Reinforcement Learning

Joonho Lee, Jemin Hwangbo, Marco Hutter

TL;DR

This work tackles robust fall recovery for quadrupedal robots by proposing a hierarchical deep reinforcement learning controller that separates self-righting, standing up, and locomotion into individual policies coordinated by a learned behavior selector. Training is performed entirely in simulation with a high-fidelity, randomized model and then deployed on the ANYmal robot, aided by a neural height estimator to mitigate state drift. The approach achieves rapid, reactive recovery from arbitrary fall configurations with a success rate exceeding 97% across 100+ trials, and demonstrates favorable sim-to-real transfer compared to a handcrafted FSM baseline. The results highlight the practicality of behavior-based RL for complex, multi-contact maneuvers and lay groundwork for extending to varied terrains via further randomization and estimator improvements.

Abstract

The ability to recover from a fall is an essential feature for a legged robot to navigate in challenging environments robustly. Until today, there has been very little progress on this topic. Current solutions mostly build upon (heuristically) predefined trajectories, resulting in unnatural behaviors and requiring considerable effort in engineering system-specific components. In this paper, we present an approach based on model-free Deep Reinforcement Learning (RL) to control recovery maneuvers of quadrupedal robots using a hierarchical behavior-based controller. The controller consists of four neural network policies including three behaviors and one behavior selector to coordinate them. Each of them is trained individually in simulation and deployed directly on a real system. We experimentally validate our approach on the quadrupedal robot ANYmal, which is a dog-sized quadrupedal system with 12 degrees of freedom. With our method, ANYmal manifests dynamic and reactive recovery behaviors to recover from an arbitrary fall configuration within less than 5 seconds. We tested the recovery maneuver more than 100 times, and the success rate was higher than 97 %.

Robust Recovery Controller for a Quadrupedal Robot using Deep Reinforcement Learning

TL;DR

This work tackles robust fall recovery for quadrupedal robots by proposing a hierarchical deep reinforcement learning controller that separates self-righting, standing up, and locomotion into individual policies coordinated by a learned behavior selector. Training is performed entirely in simulation with a high-fidelity, randomized model and then deployed on the ANYmal robot, aided by a neural height estimator to mitigate state drift. The approach achieves rapid, reactive recovery from arbitrary fall configurations with a success rate exceeding 97% across 100+ trials, and demonstrates favorable sim-to-real transfer compared to a handcrafted FSM baseline. The results highlight the practicality of behavior-based RL for complex, multi-contact maneuvers and lay groundwork for extending to varied terrains via further randomization and estimator improvements.

Abstract

The ability to recover from a fall is an essential feature for a legged robot to navigate in challenging environments robustly. Until today, there has been very little progress on this topic. Current solutions mostly build upon (heuristically) predefined trajectories, resulting in unnatural behaviors and requiring considerable effort in engineering system-specific components. In this paper, we present an approach based on model-free Deep Reinforcement Learning (RL) to control recovery maneuvers of quadrupedal robots using a hierarchical behavior-based controller. The controller consists of four neural network policies including three behaviors and one behavior selector to coordinate them. Each of them is trained individually in simulation and deployed directly on a real system. We experimentally validate our approach on the quadrupedal robot ANYmal, which is a dog-sized quadrupedal system with 12 degrees of freedom. With our method, ANYmal manifests dynamic and reactive recovery behaviors to recover from an arbitrary fall configuration within less than 5 seconds. We tested the recovery maneuver more than 100 times, and the success rate was higher than 97 %.

Paper Structure

This paper contains 27 sections, 11 figures, 2 tables, 1 algorithm.

Figures (11)

  • Figure 1: A recovery maneuver of ANYmal. (Top left) ANYmal is initialized at a fall configuration. (Top row) ANYmal swings its legs to gain momentum, (Middle row) pushes the ground to regain the upright and stable posture, and then (Bottom row) stands up and walks.
  • Figure 2: Control architecture for the recovery controller. TSIF refers to the Two State Implicit Filter bloesch2018two.
  • Figure 3: (a) Sampled initial states and (b) the target configuration of the self-righting task.
  • Figure 4: FSM for behavior selection.
  • Figure 5: Simulation for ANYmal.
  • ...and 6 more figures