RecoveryChaining: Learning Local Recovery Policies for Robust Manipulation
Shivam Vats, Devesh K. Jha, Maxim Likhachev, Oliver Kroemer, Diego Romeres
TL;DR
RecoveryChaining presents a hierarchical reinforcement learning framework that learns robust recovery policies for multi-step manipulation by leveraging a hybrid action space of primitive actions and temporally extended nominal options that transfer control to model-based controllers. The method includes Failure Discovery to gather failure cases in simulation and Recovery Learning to train policies that steer the robot back to a nominal-controller precondition, using Monte-Carlo estimates of task success as rewards; Lazy RecoveryChaining further speeds learning by using high-precision binary classifiers to lazily invoke nominal options. Across pick-and-place, shelf, and cluttered-shelf domains with sparse rewards, RC and especially Lazy RC achieve significantly higher recovery success than baselines, and the learned policies transfer to a physical robot without additional fine-tuning, demonstrating sim-to-real viability. The work highlights the value of combining model-based controllers with learned recoveries, while noting limitations such as dependence on accurate simulators, initiation-set assumptions, and challenges under strong partial observability.
Abstract
Model-based planners and controllers are commonly used to solve complex manipulation problems as they can efficiently optimize diverse objectives and generalize to long horizon tasks. However, they often fail during deployment due to noisy actuation, partial observability and imperfect models. To enable a robot to recover from such failures, we propose to use hierarchical reinforcement learning to learn a recovery policy. The recovery policy is triggered when a failure is detected based on sensory observations and seeks to take the robot to a state from which it can complete the task using the nominal model-based controllers. Our approach, called RecoveryChaining, uses a hybrid action space, where the model-based controllers are provided as additional \emph{nominal} options which allows the recovery policy to decide how to recover, when to switch to a nominal controller and which controller to switch to even with \emph{sparse rewards}. We evaluate our approach in three multi-step manipulation tasks with sparse rewards, where it learns significantly more robust recovery policies than those learned by baselines. We successfully transfer recovery policies learned in simulation to a physical robot to demonstrate the feasibility of sim-to-real transfer with our method.
