Table of Contents
Fetching ...

ARCADE: Scalable Demonstration Collection and Generation via Augmented Reality for Imitation Learning

Yue Yang, Bryce Ikeda, Gedas Bertasius, Daniel Szafir

TL;DR

The Augmented Reality for Collection and generAtion of DEmonstrations (ARCADE) framework is introduced, designed to scale up demonstration collection for robot manipulation tasks and enables the automatic generation of additional synthetic demonstrations from a single human-derived demonstration, significantly reducing user effort and time.

Abstract

Robot Imitation Learning (IL) is a crucial technique in robot learning, where agents learn by mimicking human demonstrations. However, IL encounters scalability challenges stemming from both non-user-friendly demonstration collection methods and the extensive time required to amass a sufficient number of demonstrations for effective training. In response, we introduce the Augmented Reality for Collection and generAtion of DEmonstrations (ARCADE) framework, designed to scale up demonstration collection for robot manipulation tasks. Our framework combines two key capabilities: 1) it leverages AR to make demonstration collection as simple as users performing daily tasks using their hands, and 2) it enables the automatic generation of additional synthetic demonstrations from a single human-derived demonstration, significantly reducing user effort and time. We assess ARCADE's performance on a real Fetch robot across three robotics tasks: 3-Waypoints-Reach, Push, and Pick-And-Place. Using our framework, we were able to rapidly train a policy using vanilla Behavioral Cloning (BC), a classic IL algorithm, which excelled across these three tasks. We also deploy ARCADE on a real household task, Pouring-Water, achieving an 80% success rate.

ARCADE: Scalable Demonstration Collection and Generation via Augmented Reality for Imitation Learning

TL;DR

The Augmented Reality for Collection and generAtion of DEmonstrations (ARCADE) framework is introduced, designed to scale up demonstration collection for robot manipulation tasks and enables the automatic generation of additional synthetic demonstrations from a single human-derived demonstration, significantly reducing user effort and time.

Abstract

Robot Imitation Learning (IL) is a crucial technique in robot learning, where agents learn by mimicking human demonstrations. However, IL encounters scalability challenges stemming from both non-user-friendly demonstration collection methods and the extensive time required to amass a sufficient number of demonstrations for effective training. In response, we introduce the Augmented Reality for Collection and generAtion of DEmonstrations (ARCADE) framework, designed to scale up demonstration collection for robot manipulation tasks. Our framework combines two key capabilities: 1) it leverages AR to make demonstration collection as simple as users performing daily tasks using their hands, and 2) it enables the automatic generation of additional synthetic demonstrations from a single human-derived demonstration, significantly reducing user effort and time. We assess ARCADE's performance on a real Fetch robot across three robotics tasks: 3-Waypoints-Reach, Push, and Pick-And-Place. Using our framework, we were able to rapidly train a policy using vanilla Behavioral Cloning (BC), a classic IL algorithm, which excelled across these three tasks. We also deploy ARCADE on a real household task, Pouring-Water, achieving an 80% success rate.

Paper Structure

This paper contains 18 sections, 1 equation, 6 figures, 2 algorithms.

Figures (6)

  • Figure 1: This figure shows the architecture of ARCADE. (I) First, a user provides a single demonstration, $\tau^{AR}$, through AR. (II) We generate a new demonstration, $\tau^{new}$, by following sampled poses, extracted from $\tau^{AR}$, and key poses, obtained via Key-Poses Detector. (III) Additional candidate demonstrations are then generated, which are visualized in AR for user validation. Users filter the candidate demonstrations to form an accepted set of generated demonstrations, $\Xi^{accepted}$. (IV) Finally, we continue generating additional new demonstrations, automatically determining whether to keep or discard each demonstration based on comparing it to $\Xi^{accepted}$ via Automatic Validation.
  • Figure 2: Left: Egocentric view showing the robot's end effector overlapping with and following the human hand's movements to perform the Push task. Right: Exocentric view showing how the human performs the task manually, with the digital twin robot end effector mirroring the hand movements.
  • Figure 3: Visualized candidate demonstrations may exhibit behaviors that could lead to rejection by the user: (left) unnatural motions due to poor IK solutions, (middle) potentially hazardous motions, (right) misalignment with the user's preferences.
  • Figure 4: We evaluate on three tasks: Left: 3-Waypoints-Reach, Middle: Push, Right: Pick-And-Place.
  • Figure 5: The results of BC policies trained using ARCADE or a kinesthetic teaching baseline (BL) with either 1 or 100 demonstrations across three tasks. The full (100 demonstration set: $\Xi^{scale}$) ARCADE system offers the best performance in all tasks.
  • ...and 1 more figures