AdaDemo: Data-Efficient Demonstration Expansion for Generalist Robotic Agent

Tongzhou Mu; Yijie Guo; Jie Xu; Ankit Goyal; Hao Su; Dieter Fox; Animesh Garg

AdaDemo: Data-Efficient Demonstration Expansion for Generalist Robotic Agent

Tongzhou Mu, Yijie Guo, Jie Xu, Ankit Goyal, Hao Su, Dieter Fox, Animesh Garg

TL;DR

AdaDemo presents an adaptive online framework for data-efficient demonstration expansion to train a single generalist visual policy across multiple robotic tasks. By iteratively evaluating the policy, collecting demonstrations only for failed initial states and hard tasks, and adaptively sampling the expanded dataset, AdaDemo achieves superior data efficiency relative to uniform data collection across RLBench and Adroit. The approach demonstrates progressive performance gains across rounds and provides ablations showing the value of focusing on failures and hard tasks. These findings highlight the practical potential of targeted demonstration collection for scalable, multi-task robotic imitation learning in simulated settings where demonstrations can be gathered efficiently.

Abstract

Encouraged by the remarkable achievements of language and vision foundation models, developing generalist robotic agents through imitation learning, using large demonstration datasets, has become a prominent area of interest in robot learning. The efficacy of imitation learning is heavily reliant on the quantity and quality of the demonstration datasets. In this study, we aim to scale up demonstrations in a data-efficient way to facilitate the learning of generalist robotic agents. We introduce AdaDemo (Adaptive Online Demonstration Expansion), a general framework designed to improve multi-task policy learning by actively and continually expanding the demonstration dataset. AdaDemo strategically collects new demonstrations to address the identified weakness in the existing policy, ensuring data efficiency is maximized. Through a comprehensive evaluation on a total of 22 tasks across two robotic manipulation benchmarks (RLBench and Adroit), we demonstrate AdaDemo's capability to progressively improve policy performance by guiding the generation of high-quality demonstration datasets in a data-efficient manner.

AdaDemo: Data-Efficient Demonstration Expansion for Generalist Robotic Agent

TL;DR

Abstract

Paper Structure (29 sections, 3 figures, 5 tables, 1 algorithm)

This paper contains 29 sections, 3 figures, 5 tables, 1 algorithm.

Introduction
Related Work
Problem Setup
Adaptive Online Demo Expansion
Overview
Online Demonstration Expansion
Iterative Improvement Process
Adaptive Demonstration Expansion
Demo Collection on Failed Initial States
Demo Collection on Unsolved Tasks
Sampling Strategy in the Collected Dataset
Experiments
Experimental Setup
Environments: RLBench
Task Description
...and 14 more sections

Figures (3)

Figure 1: Comparison of data efficiency between AdaDemo (adaptively expanding the demo dataset) and Uniform (collecting more demonstrations uniformly). After achieving a mediocre success rate 57% on RLBench and 62% on Adroit, Uniform only gains slightly better success rate with a huge increase in demonstration numbers. While the baseline's performance plateaus, AdaDemo continues improving multi-task performance iteratively. Overall, it achieves better performance with only 1/2 the data on RLBench and 1/3 on Adroit. This data efficiency could translate into substantial cost savings in large-scale demonstration collection.
Figure 2: AdaDemo iteratively expands the demonstration dataset through online evaluation of the trained policy, adaptively collecting additional demonstrations to target cases where the multi-task policy most needs improvement.
Figure 3: Tasks: We consider challenging and diverse robotic manipulation tasks spanning two benchmarks: RLBench (table-top robot arm manipulation) and Adroit (dexterous manipulation).

AdaDemo: Data-Efficient Demonstration Expansion for Generalist Robotic Agent

TL;DR

Abstract

AdaDemo: Data-Efficient Demonstration Expansion for Generalist Robotic Agent

Authors

TL;DR

Abstract

Table of Contents

Figures (3)