The Art of Imitation: Learning Long-Horizon Manipulation Tasks from Few Demonstrations

Jan Ole von Hartz; Tim Welschehold; Abhinav Valada; Joschka Boedecker

The Art of Imitation: Learning Long-Horizon Manipulation Tasks from Few Demonstrations

Jan Ole von Hartz, Tim Welschehold, Abhinav Valada, Joschka Boedecker

TL;DR

This work proposes to factorize the robot's end-effector velocity into its direction and magnitude, and model them using Riemannian GMMs, and presents a method to automatically detect relevant task parameters per skill from visual observations.

Abstract

Task Parametrized Gaussian Mixture Models (TP-GMM) are a sample-efficient method for learning object-centric robot manipulation tasks. However, there are several open challenges to applying TP-GMMs in the wild. In this work, we tackle three crucial challenges synergistically. First, end-effector velocities are non-Euclidean and thus hard to model using standard GMMs. We thus propose to factorize the robot's end-effector velocity into its direction and magnitude, and model them using Riemannian GMMs. Second, we leverage the factorized velocities to segment and sequence skills from complex demonstration trajectories. Through the segmentation, we further align skill trajectories and hence leverage time as a powerful inductive bias. Third, we present a method to automatically detect relevant task parameters per skill from visual observations. Our approach enables learning complex manipulation tasks from just five demonstrations while using only RGB-D observations. Extensive experimental evaluations on RLBench demonstrate that our approach achieves state-of-the-art performance with 20-fold improved sample efficiency. Our policies generalize across different environments, object instances, and object positions, while the learned skills are reusable.

The Art of Imitation: Learning Long-Horizon Manipulation Tasks from Few Demonstrations

TL;DR

Abstract

Paper Structure (36 sections, 25 equations, 12 figures, 5 tables)

This paper contains 36 sections, 25 equations, 12 figures, 5 tables.

Introduction
Related Work
Background
Gaussian Mixture Model
Time-Driven vs. State-Driven Models
Task-Parameterization
Riemannian Manifolds
Technical Approach
Action Factorization
Gripper Action
Skill Segmentation
Skill Sequencing and Skill Reuse
Time-Based Initialization
Task-Parameterization
Candidate Generation
...and 21 more sections

Figures (12)

Figure 1: TAPAS-GMM: Task Auto-Parameterized And Skill Segmented GMM learns task-parameterized manipulation policies from only a handful of complex task demonstrations. First, we segment the full task demonstrations into the involved skills. For each segment, we then automatically select the relevant task parameters and learn a Riemannian Task-Parameterized Hidden Markov Model (TP-HMM). The skill models can be cascaded and reused flexibly. To enable modeling of the robot's end-effector velocity, we further leverage a novel action factorization and Riemannian geometry.
Figure 2: Velocity trajectories from the first skill in StackWine. Left: The velocities are difficult to cluster in Euclidean space. Middle:$\mathcal{S}^2$ models the movement direction. Right: The associated action magnitudes.
Figure 3: Left: GMM on unaligned demos. Right: GMM on aligned demos. Dotted lines indicate segment borders.
Figure 4: Overview of our approach. Learning: First, we segment a set of complex task demonstrations with unaligned skills. Next, we generate a set of candidate task parameters from visual observations and select the relevant parameters for each segment. Finally, we fit one Task-Parameterized Hidden Markov Model (TP-HMM) per segment. Inference: To make a prediction for new visual observations, we again extract the set of task parameters and select the task parameters determined during the learning phase. We then cascade the segment TP-HMMs.
Figure 5: Velocity-based skill segmentation on two noisy robot trajectories for a pouring task. Red ellipses indicate candidate segmentation points that were filtered out: the first set is too close to the start of the trajectory before the robot begins moving, while the second set results from noise in the trajectory. The final segmentation points are the centers of extended sub-threshold segments.
...and 7 more figures

The Art of Imitation: Learning Long-Horizon Manipulation Tasks from Few Demonstrations

TL;DR

Abstract

The Art of Imitation: Learning Long-Horizon Manipulation Tasks from Few Demonstrations

Authors

TL;DR

Abstract

Table of Contents

Figures (12)