Learning from Successful and Failed Demonstrations via Optimization

Brendan Hertel; S. Reza Ahmadzadeh

Learning from Successful and Failed Demonstrations via Optimization

Brendan Hertel, S. Reza Ahmadzadeh

TL;DR

The paper tackles sub-optimal demonstrations in Learning from Demonstration by proposing Trajectory Learning from Failed and Successful Demonstrations (TLFSD), which jointly models successful and failed data with Gaussian mixtures and Gaussian Mixture Regression to form quadratic costs. These costs are scalarized to balance convergence to successful examples with divergence from failed ones, and a constrained quadratic-penalty optimization yields reproduced trajectories that respect user-defined via/initial/final points while remaining smooth through an elastic regularizer. Key contributions include handling empty data subsets, robust obstacle avoidance driven by failed demonstrations, and a multi-coordinate extension that preserves curvature features; empirical results on 2D/3D tasks and real UR5e experiments demonstrate improved performance over GMM/GMR-wEM and conventional LfD methods. The proposed approach offers a data-efficient alternative to RL-based methods by exploiting both successes and failures to reproduce skills under varied constraints, with potential for iterative human-in-the-loop refinement and cross-coordinate feature encoding.

Abstract

Learning from Demonstration (LfD) is a popular approach that allows humans to teach robots new skills by showing the correct way(s) of performing the desired skill. Human-provided demonstrations, however, are not always optimal and the teacher usually addresses this issue by discarding or replacing sub-optimal (noisy or faulty) demonstrations. We propose a novel LfD representation that learns from both successful and failed demonstrations of a skill. Our approach encodes the two subsets of captured demonstrations (labeled by the teacher) into a statistical skill model, constructs a set of quadratic costs, and finds an optimal reproduction of the skill under novel problem conditions (i.e. constraints). The optimal reproduction balances convergence towards successful examples and divergence from failed examples. We evaluate our approach through several 2D and 3D experiments in real-world using a UR5e manipulator arm and also show that it can reproduce a skill from only failed demonstrations. The benefits of exploiting both failed and successful demonstrations are shown through comparison with two existing LfD approaches. We also compare our approach against an existing skill refinement method and show its capabilities in a multi-coordinate setting.

Learning from Successful and Failed Demonstrations via Optimization

TL;DR

Abstract

Learning from Successful and Failed Demonstrations via Optimization

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)