AdaDemo: Data-Efficient Demonstration Expansion for Generalist Robotic Agent
Tongzhou Mu, Yijie Guo, Jie Xu, Ankit Goyal, Hao Su, Dieter Fox, Animesh Garg
TL;DR
AdaDemo presents an adaptive online framework for data-efficient demonstration expansion to train a single generalist visual policy across multiple robotic tasks. By iteratively evaluating the policy, collecting demonstrations only for failed initial states and hard tasks, and adaptively sampling the expanded dataset, AdaDemo achieves superior data efficiency relative to uniform data collection across RLBench and Adroit. The approach demonstrates progressive performance gains across rounds and provides ablations showing the value of focusing on failures and hard tasks. These findings highlight the practical potential of targeted demonstration collection for scalable, multi-task robotic imitation learning in simulated settings where demonstrations can be gathered efficiently.
Abstract
Encouraged by the remarkable achievements of language and vision foundation models, developing generalist robotic agents through imitation learning, using large demonstration datasets, has become a prominent area of interest in robot learning. The efficacy of imitation learning is heavily reliant on the quantity and quality of the demonstration datasets. In this study, we aim to scale up demonstrations in a data-efficient way to facilitate the learning of generalist robotic agents. We introduce AdaDemo (Adaptive Online Demonstration Expansion), a general framework designed to improve multi-task policy learning by actively and continually expanding the demonstration dataset. AdaDemo strategically collects new demonstrations to address the identified weakness in the existing policy, ensuring data efficiency is maximized. Through a comprehensive evaluation on a total of 22 tasks across two robotic manipulation benchmarks (RLBench and Adroit), we demonstrate AdaDemo's capability to progressively improve policy performance by guiding the generation of high-quality demonstration datasets in a data-efficient manner.
