Table of Contents
Fetching ...

Learning from Successful and Failed Demonstrations via Optimization

Brendan Hertel, S. Reza Ahmadzadeh

TL;DR

The paper tackles sub-optimal demonstrations in Learning from Demonstration by proposing Trajectory Learning from Failed and Successful Demonstrations (TLFSD), which jointly models successful and failed data with Gaussian mixtures and Gaussian Mixture Regression to form quadratic costs. These costs are scalarized to balance convergence to successful examples with divergence from failed ones, and a constrained quadratic-penalty optimization yields reproduced trajectories that respect user-defined via/initial/final points while remaining smooth through an elastic regularizer. Key contributions include handling empty data subsets, robust obstacle avoidance driven by failed demonstrations, and a multi-coordinate extension that preserves curvature features; empirical results on 2D/3D tasks and real UR5e experiments demonstrate improved performance over GMM/GMR-wEM and conventional LfD methods. The proposed approach offers a data-efficient alternative to RL-based methods by exploiting both successes and failures to reproduce skills under varied constraints, with potential for iterative human-in-the-loop refinement and cross-coordinate feature encoding.

Abstract

Learning from Demonstration (LfD) is a popular approach that allows humans to teach robots new skills by showing the correct way(s) of performing the desired skill. Human-provided demonstrations, however, are not always optimal and the teacher usually addresses this issue by discarding or replacing sub-optimal (noisy or faulty) demonstrations. We propose a novel LfD representation that learns from both successful and failed demonstrations of a skill. Our approach encodes the two subsets of captured demonstrations (labeled by the teacher) into a statistical skill model, constructs a set of quadratic costs, and finds an optimal reproduction of the skill under novel problem conditions (i.e. constraints). The optimal reproduction balances convergence towards successful examples and divergence from failed examples. We evaluate our approach through several 2D and 3D experiments in real-world using a UR5e manipulator arm and also show that it can reproduce a skill from only failed demonstrations. The benefits of exploiting both failed and successful demonstrations are shown through comparison with two existing LfD approaches. We also compare our approach against an existing skill refinement method and show its capabilities in a multi-coordinate setting.

Learning from Successful and Failed Demonstrations via Optimization

TL;DR

The paper tackles sub-optimal demonstrations in Learning from Demonstration by proposing Trajectory Learning from Failed and Successful Demonstrations (TLFSD), which jointly models successful and failed data with Gaussian mixtures and Gaussian Mixture Regression to form quadratic costs. These costs are scalarized to balance convergence to successful examples with divergence from failed ones, and a constrained quadratic-penalty optimization yields reproduced trajectories that respect user-defined via/initial/final points while remaining smooth through an elastic regularizer. Key contributions include handling empty data subsets, robust obstacle avoidance driven by failed demonstrations, and a multi-coordinate extension that preserves curvature features; empirical results on 2D/3D tasks and real UR5e experiments demonstrate improved performance over GMM/GMR-wEM and conventional LfD methods. The proposed approach offers a data-efficient alternative to RL-based methods by exploiting both successes and failures to reproduce skills under varied constraints, with potential for iterative human-in-the-loop refinement and cross-coordinate feature encoding.

Abstract

Learning from Demonstration (LfD) is a popular approach that allows humans to teach robots new skills by showing the correct way(s) of performing the desired skill. Human-provided demonstrations, however, are not always optimal and the teacher usually addresses this issue by discarding or replacing sub-optimal (noisy or faulty) demonstrations. We propose a novel LfD representation that learns from both successful and failed demonstrations of a skill. Our approach encodes the two subsets of captured demonstrations (labeled by the teacher) into a statistical skill model, constructs a set of quadratic costs, and finds an optimal reproduction of the skill under novel problem conditions (i.e. constraints). The optimal reproduction balances convergence towards successful examples and divergence from failed examples. We evaluate our approach through several 2D and 3D experiments in real-world using a UR5e manipulator arm and also show that it can reproduce a skill from only failed demonstrations. The benefits of exploiting both failed and successful demonstrations are shown through comparison with two existing LfD approaches. We also compare our approach against an existing skill refinement method and show its capabilities in a multi-coordinate setting.

Paper Structure

This paper contains 16 sections, 7 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Learning from successful (green) and failed (red) demonstrations results in reproductions that can avoid obstacles by diverging from the failed demonstration sets (details in Sec. \ref{['sec:Exp_B']}).
  • Figure 2: Workflow of the proposed approach.
  • Figure 3: Similarities between successful and failed demonstrations vary locally.
  • Figure 4: (a) results of using various combinations of failed and successful demonstrations in the presence of constraints. (b) and (c) comparison between TLFSD and GMM/GMR-wEM in a simulated pushing task. (d) comparison between DMPs, LTE, and TLFSD in a simulated writing task.
  • Figure 5: (a) and (b) TLFSD adapting to the change in scene when an obstacle is placed, resulting in successful reproductions for a reaching task. (c) and (d) TLFSD finds a successful reproduction for a pushing task with no constraints given.
  • ...and 2 more figures