Adversarial Data Collection: Human-Collaborative Perturbations for Efficient and Robust Robotic Imitation Learning
Siyuan Huang, Yue Liao, Siyuan Feng, Shu Jiang, Si Liu, Hongsheng Li, Maoqing Yao, Guanghui Ren
TL;DR
The paper tackles data efficiency in real-world robotic imitation by introducing Adversarial Data Collection (ADC), a two-person-in-the-loop framework that injects real-time visual and linguistic perturbations to maximize per-demo information content. By formalizing data-unit density for Vision-Language-Action models and integrating adversarial perturbations, ADC substantially improves compositional generalization and robustness, enabling strong performance with only a fraction of traditional data ($$20\%$$) and yielding broader task coverage. Empirical results on conventional policies and Vision-Language-Action models show enhanced robustness to perceptual perturbations, dynamic environments, and sensor failures, alongside improved attention and observation diversity. The authors also provide an open-source ADC-Robotics dataset to facilitate further research, highlighting a practical shift toward data-quality and human-in-the-loop perturbations as a scalable route to embodied generalization.
Abstract
The pursuit of data efficiency, where quality outweighs quantity, has emerged as a cornerstone in robotic manipulation, especially given the high costs associated with real-world data collection. We propose that maximizing the informational density of individual demonstrations can dramatically reduce reliance on large-scale datasets while improving task performance. To this end, we introduce Adversarial Data Collection, a Human-in-the-Loop (HiL) framework that redefines robotic data acquisition through real-time, bidirectional human-environment interactions. Unlike conventional pipelines that passively record static demonstrations, ADC adopts a collaborative perturbation paradigm: during a single episode, an adversarial operator dynamically alters object states, environmental conditions, and linguistic commands, while the tele-operator adaptively adjusts actions to overcome these evolving challenges. This process compresses diverse failure-recovery behaviors, compositional task variations, and environmental perturbations into minimal demonstrations. Our experiments demonstrate that ADC-trained models achieve superior compositional generalization to unseen task instructions, enhanced robustness to perceptual perturbations, and emergent error recovery capabilities. Strikingly, models trained with merely 20% of the demonstration volume collected through ADC significantly outperform traditional approaches using full datasets. These advances bridge the gap between data-centric learning paradigms and practical robotic deployment, demonstrating that strategic data acquisition, not merely post-hoc processing, is critical for scalable, real-world robot learning. Additionally, we are curating a large-scale ADC-Robotics dataset comprising real-world manipulation tasks with adversarial perturbations. This benchmark will be open-sourced to facilitate advancements in robotic imitation learning.
