Table of Contents
Fetching ...

MILES: Making Imitation Learning Easy with Self-Supervision

Georgios Papagiannis, Edward Johns

TL;DR

This work proposes an alternative approach, MILES: a fully autonomous, self-supervised data collection paradigm, and shows that this enables efficient policy learning from just a single demonstration and a single environment reset.

Abstract

Data collection in imitation learning often requires significant, laborious human supervision, such as numerous demonstrations, and/or frequent environment resets for methods that incorporate reinforcement learning. In this work, we propose an alternative approach, MILES: a fully autonomous, self-supervised data collection paradigm, and we show that this enables efficient policy learning from just a single demonstration and a single environment reset. MILES autonomously learns a policy for returning to and then following the single demonstration, whilst being self-guided during data collection, eliminating the need for additional human interventions. We evaluated MILES across several real-world tasks, including tasks that require precise contact-rich manipulation such as locking a lock with a key. We found that, under the constraints of a single demonstration and no repeated environment resetting, MILES significantly outperforms state-of-the-art alternatives like imitation learning methods that leverage reinforcement learning. Videos of our experiments and code can be found on our webpage: www.robot-learning.uk/miles.

MILES: Making Imitation Learning Easy with Self-Supervision

TL;DR

This work proposes an alternative approach, MILES: a fully autonomous, self-supervised data collection paradigm, and shows that this enables efficient policy learning from just a single demonstration and a single environment reset.

Abstract

Data collection in imitation learning often requires significant, laborious human supervision, such as numerous demonstrations, and/or frequent environment resets for methods that incorporate reinforcement learning. In this work, we propose an alternative approach, MILES: a fully autonomous, self-supervised data collection paradigm, and we show that this enables efficient policy learning from just a single demonstration and a single environment reset. MILES autonomously learns a policy for returning to and then following the single demonstration, whilst being self-guided during data collection, eliminating the need for additional human interventions. We evaluated MILES across several real-world tasks, including tasks that require precise contact-rich manipulation such as locking a lock with a key. We found that, under the constraints of a single demonstration and no repeated environment resetting, MILES significantly outperforms state-of-the-art alternatives like imitation learning methods that leverage reinforcement learning. Videos of our experiments and code can be found on our webpage: www.robot-learning.uk/miles.

Paper Structure

This paper contains 43 sections, 8 figures, 4 tables.

Figures (8)

  • Figure 1: (a) Behavioural cloning from a single demonstration fails to generalize to states outside the demonstration, due to covariate shift. (b) Providing multiple demonstrations addresses this, but requires significant human effort. (c) While incorporating reinforcement learning addresses the issue of covariate shift and the need for multiple demonstrations, it requires frequent environment resetting and is highly inefficient due to random exploration. (d) In MILES, we propose a new self-supervised paradigm that overcomes these issues and can learn a range of complex tasks from a single demonstration and no additional human effort, by collecting augmentation trajectories that guide the robot back to the demonstration.
  • Figure 2: MILES Overview: (1) First, the user provides a single demonstration and (2) resets the environment only once. (3) Then, the robot (autonomously) collects self-supervised data. Several augmentation trajectories are collected for each demonstration waypoint until an environment disturbance is detected or sufficient data is collected for all waypoints. Each augmentation trajectory is either a straight line, if the motion occurs in free space, or a more complex, curved path as the augmentation trajectory can be reshaped by collisions with the environment (e.g., with the lock as shown above). (3) (a-b) To collect an augmentation trajectory, the robot first moves from a demonstration waypoint to a random pose. (c) Then, it attempts to return back to the waypoint while recording RGB images and force-torque feedback . (d) After completing the trajectory, we check whether the achieved state meets the conditions of reachability and environment disturbance.
  • Figure 3: After finishing the data collection, each augmentation trajectory is fused with the demonstration segment following the demonstration waypoint it returns to, to create a dataset of new demonstration trajectories.
  • Figure 4: The tasks used in our experiments. The "Markers in Bin" is used to evaluate MILES' ability to generalize (the bins marked green denote the training set, while the red denote the test set).
  • Figure 5: MILES' performance when trained only on either vision or force feedback or both.
  • ...and 3 more figures