Table of Contents
Fetching ...

Residual Reinforcement Learning from Demonstrations

Minttu Alakuijala, Gabriel Dulac-Arnold, Julien Mairal, Jean Ponce, Cordelia Schmid

TL;DR

This work tackles the challenge of data-efficient policy learning for robotics under visual inputs and sparse rewards. It introduces Residual Reinforcement Learning from Demonstrations (RRLfD), which first learns a base policy from demonstrations via behavioral cloning on image and proprioceptive data, then learns a lightweight residual policy to corrective actions using RL, with the base policy fixed during residual training. Empirically, RRLfD demonstrates improved generalization to unseen environments and faster task completion on high-dimensional manipulation tasks, outperforming both BC alone and RL-from-scratch baselines, and showing favorable data efficiency and stability. The method is broadly applicable to continuous-control problems and offers a practical path to leveraging demonstrations for vision-based robotics without requiring hand-crafted state estimators or controllers.

Abstract

Residual reinforcement learning (RL) has been proposed as a way to solve challenging robotic tasks by adapting control actions from a conventional feedback controller to maximize a reward signal. We extend the residual formulation to learn from visual inputs and sparse rewards using demonstrations. Learning from images, proprioceptive inputs and a sparse task-completion reward relaxes the requirement of accessing full state features, such as object and target positions. In addition, replacing the base controller with a policy learned from demonstrations removes the dependency on a hand-engineered controller in favour of a dataset of demonstrations, which can be provided by non-experts. Our experimental evaluation on simulated manipulation tasks on a 6-DoF UR5 arm and a 28-DoF dexterous hand demonstrates that residual RL from demonstrations is able to generalize to unseen environment conditions more flexibly than either behavioral cloning or RL fine-tuning, and is capable of solving high-dimensional, sparse-reward tasks out of reach for RL from scratch.

Residual Reinforcement Learning from Demonstrations

TL;DR

This work tackles the challenge of data-efficient policy learning for robotics under visual inputs and sparse rewards. It introduces Residual Reinforcement Learning from Demonstrations (RRLfD), which first learns a base policy from demonstrations via behavioral cloning on image and proprioceptive data, then learns a lightweight residual policy to corrective actions using RL, with the base policy fixed during residual training. Empirically, RRLfD demonstrates improved generalization to unseen environments and faster task completion on high-dimensional manipulation tasks, outperforming both BC alone and RL-from-scratch baselines, and showing favorable data efficiency and stability. The method is broadly applicable to continuous-control problems and offers a practical path to leveraging demonstrations for vision-based robotics without requiring hand-crafted state estimators or controllers.

Abstract

Residual reinforcement learning (RL) has been proposed as a way to solve challenging robotic tasks by adapting control actions from a conventional feedback controller to maximize a reward signal. We extend the residual formulation to learn from visual inputs and sparse rewards using demonstrations. Learning from images, proprioceptive inputs and a sparse task-completion reward relaxes the requirement of accessing full state features, such as object and target positions. In addition, replacing the base controller with a policy learned from demonstrations removes the dependency on a hand-engineered controller in favour of a dataset of demonstrations, which can be provided by non-experts. Our experimental evaluation on simulated manipulation tasks on a 6-DoF UR5 arm and a 28-DoF dexterous hand demonstrates that residual RL from demonstrations is able to generalize to unseen environment conditions more flexibly than either behavioral cloning or RL fine-tuning, and is capable of solving high-dimensional, sparse-reward tasks out of reach for RL from scratch.

Paper Structure

This paper contains 16 sections, 6 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: a) We propose a way to leverage demonstration data to learn a control policy as well as task-relevant visual features through behavioral cloning on image and proprioceptive inputs. b) The policy is then improved through reinforcement learning by a superimposed residual policy, based on the learned visual features, allowing data-efficient learning of control policies in image space from sparse rewards.
  • Figure 2: We evaluate RRLfD on seven manipulation tasks on two different robotic simulation platforms: a 6-DoF UR5 arm (a--c) and a 28-DoF ShadowHand model (d--g).
  • Figure 3: BC success rates (as %) evaluated on 100 unseen initial states. (10 seeds, 95% confidence intervals)
  • Figure 4: Success rates of residual policies as a function of base policy success rate (mean of 5 seeds, 95% confidence intervals).
  • Figure 5: Success rates for the residual agent over training (5 seeds, 95% confidence intervals).
  • ...and 1 more figures