Robust Recovery Controller for a Quadrupedal Robot using Deep Reinforcement Learning
Joonho Lee, Jemin Hwangbo, Marco Hutter
TL;DR
This work tackles robust fall recovery for quadrupedal robots by proposing a hierarchical deep reinforcement learning controller that separates self-righting, standing up, and locomotion into individual policies coordinated by a learned behavior selector. Training is performed entirely in simulation with a high-fidelity, randomized model and then deployed on the ANYmal robot, aided by a neural height estimator to mitigate state drift. The approach achieves rapid, reactive recovery from arbitrary fall configurations with a success rate exceeding 97% across 100+ trials, and demonstrates favorable sim-to-real transfer compared to a handcrafted FSM baseline. The results highlight the practicality of behavior-based RL for complex, multi-contact maneuvers and lay groundwork for extending to varied terrains via further randomization and estimator improvements.
Abstract
The ability to recover from a fall is an essential feature for a legged robot to navigate in challenging environments robustly. Until today, there has been very little progress on this topic. Current solutions mostly build upon (heuristically) predefined trajectories, resulting in unnatural behaviors and requiring considerable effort in engineering system-specific components. In this paper, we present an approach based on model-free Deep Reinforcement Learning (RL) to control recovery maneuvers of quadrupedal robots using a hierarchical behavior-based controller. The controller consists of four neural network policies including three behaviors and one behavior selector to coordinate them. Each of them is trained individually in simulation and deployed directly on a real system. We experimentally validate our approach on the quadrupedal robot ANYmal, which is a dog-sized quadrupedal system with 12 degrees of freedom. With our method, ANYmal manifests dynamic and reactive recovery behaviors to recover from an arbitrary fall configuration within less than 5 seconds. We tested the recovery maneuver more than 100 times, and the success rate was higher than 97 %.
