Table of Contents
Fetching ...

Residual Learning from Demonstration: Adapting DMPs for Contact-rich Manipulation

Todor Davchev, Kevin Sebastian Luck, Michael Burke, Franziska Meier, Stefan Schaal, Subramanian Ramamoorthy

TL;DR

<3-5 sentence high-level summary> The paper addresses robust contact-rich insertion tasks by integrating demonstration-based Dynamic Movement Primitives (DMPs) with reinforcement learning through residual corrections in task space. It extends to full-pose residual learning using quaternion-based orientation corrections and demonstrates that nonlinear, full-pose residual policies significantly improve accuracy, generalization, and transfer with sparse rewards in both simulation and real-robot experiments. Key contributions include a comprehensive comparison of DMP adaptation strategies (C1), a framework for full-pose residual learning (C2), and empirical evidence that full-pose, nonlinear residuals outperform translation-only or linear approaches (C3). The work offers practical, sample-efficient methods with potential for real-world deployment and cross-task transfer in varied geometries and friction conditions.

Abstract

Manipulation skills involving contact and friction are inherent to many robotics tasks. Using the class of motor primitives for peg-in-hole like insertions, we study how robots can learn such skills. Dynamic Movement Primitives (DMP) are a popular way of extracting such policies through behaviour cloning (BC) but can struggle in the context of insertion. Policy adaptation strategies such as residual learning can help improve the overall performance of policies in the context of contact-rich manipulation. However, it is not clear how to best do this with DMPs. As a result, we consider several possible ways for adapting a DMP formulation and propose ``residual Learning from Demonstration`` (rLfD), a framework that combines DMPs with Reinforcement Learning (RL) to learn a residual correction policy. Our evaluations suggest that applying residual learning directly in task space and operating on the full pose of the robot can significantly improve the overall performance of DMPs. We show that rLfD offers a gentle to the joints solution that improves the task success and generalisation of DMPs \rb{and enables transfer to different geometries and frictions through few-shot task adaptation}. The proposed framework is evaluated on a set of tasks. A simulated robot and a physical robot have to successfully insert pegs, gears and plugs into their respective sockets. Other material and videos accompanying this paper are provided at https://sites.google.com/view/rlfd/.

Residual Learning from Demonstration: Adapting DMPs for Contact-rich Manipulation

TL;DR

<3-5 sentence high-level summary> The paper addresses robust contact-rich insertion tasks by integrating demonstration-based Dynamic Movement Primitives (DMPs) with reinforcement learning through residual corrections in task space. It extends to full-pose residual learning using quaternion-based orientation corrections and demonstrates that nonlinear, full-pose residual policies significantly improve accuracy, generalization, and transfer with sparse rewards in both simulation and real-robot experiments. Key contributions include a comprehensive comparison of DMP adaptation strategies (C1), a framework for full-pose residual learning (C2), and empirical evidence that full-pose, nonlinear residuals outperform translation-only or linear approaches (C3). The work offers practical, sample-efficient methods with potential for real-world deployment and cross-task transfer in varied geometries and friction conditions.

Abstract

Manipulation skills involving contact and friction are inherent to many robotics tasks. Using the class of motor primitives for peg-in-hole like insertions, we study how robots can learn such skills. Dynamic Movement Primitives (DMP) are a popular way of extracting such policies through behaviour cloning (BC) but can struggle in the context of insertion. Policy adaptation strategies such as residual learning can help improve the overall performance of policies in the context of contact-rich manipulation. However, it is not clear how to best do this with DMPs. As a result, we consider several possible ways for adapting a DMP formulation and propose ``residual Learning from Demonstration`` (rLfD), a framework that combines DMPs with Reinforcement Learning (RL) to learn a residual correction policy. Our evaluations suggest that applying residual learning directly in task space and operating on the full pose of the robot can significantly improve the overall performance of DMPs. We show that rLfD offers a gentle to the joints solution that improves the task success and generalisation of DMPs \rb{and enables transfer to different geometries and frictions through few-shot task adaptation}. The proposed framework is evaluated on a set of tasks. A simulated robot and a physical robot have to successfully insert pegs, gears and plugs into their respective sockets. Other material and videos accompanying this paper are provided at https://sites.google.com/view/rlfd/.

Paper Structure

This paper contains 31 sections, 4 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Left: outline of the proposed framework. We collect demonstrations using an HTC Vive tracker, and extract an initial full pose policy using Dynamical Movement Primitives (DMPs, running at 100Hz). The control command produced by the DMP is corrected by an additional residual policy trained using model-free RL (run at 10Hz). The resulting motor command is then fed into a real time impedance controller (running at 500Hz) in a Franka Panda arm that performs peg, gear or LAN cable insertion in our physical setup. Right: The peg (top), gear (middle) and LAN cable (bottom) insertion tasks considered in this work.
  • Figure 2: Types of exploration perturbations with Gaussian noise, $\eta$, for a simple Archimedian spiral. Applied on a translational DMP policy (purple - A), shown as equation above. Perturbing directly in task space (green - D) results in local exploration that is important for contact-rich manipulation. In contrast, perturbing the phase-modulated coupling term, $C_t$ (blue - B) or the parameters $\omega$ of the forcing term, $f_{\omega}$ (orange - C) results in locally smooth trajectories. Not perturbing the DMP is depicted in purple.
  • Figure 3: An easy task (left) and a harder task (right). The robot is initialised with a position sampled uniformly within $\pm$12cm along all axes of the initial position of the demo. Difficulty is defined by the size of the hole. A task is complete when a peg is fully inserted.
  • Figure 4: Comparison between using residual, hybrid, DMP and model-free policies. The residual policy (green) consistently results in experiencing generalised forces comparable to the forces experienced using only a DMP (red) when succeeding. It experiences even smaller forces across both the easy and hard tasks when failing to insert. Lower is better.
  • Figure 5: Successful peg, gear and RJ-45 insertions. Peg insertion results along the x axis are cm away from the initial position (illustrated by a red vertical line). Gear and RJ-45 results are plotted along the x axis as degrees away from the initial orientation. Higher is better.