Table of Contents
Fetching ...

Adversarial Data Collection: Human-Collaborative Perturbations for Efficient and Robust Robotic Imitation Learning

Siyuan Huang, Yue Liao, Siyuan Feng, Shu Jiang, Si Liu, Hongsheng Li, Maoqing Yao, Guanghui Ren

TL;DR

The paper tackles data efficiency in real-world robotic imitation by introducing Adversarial Data Collection (ADC), a two-person-in-the-loop framework that injects real-time visual and linguistic perturbations to maximize per-demo information content. By formalizing data-unit density for Vision-Language-Action models and integrating adversarial perturbations, ADC substantially improves compositional generalization and robustness, enabling strong performance with only a fraction of traditional data ($$20\%$$) and yielding broader task coverage. Empirical results on conventional policies and Vision-Language-Action models show enhanced robustness to perceptual perturbations, dynamic environments, and sensor failures, alongside improved attention and observation diversity. The authors also provide an open-source ADC-Robotics dataset to facilitate further research, highlighting a practical shift toward data-quality and human-in-the-loop perturbations as a scalable route to embodied generalization.

Abstract

The pursuit of data efficiency, where quality outweighs quantity, has emerged as a cornerstone in robotic manipulation, especially given the high costs associated with real-world data collection. We propose that maximizing the informational density of individual demonstrations can dramatically reduce reliance on large-scale datasets while improving task performance. To this end, we introduce Adversarial Data Collection, a Human-in-the-Loop (HiL) framework that redefines robotic data acquisition through real-time, bidirectional human-environment interactions. Unlike conventional pipelines that passively record static demonstrations, ADC adopts a collaborative perturbation paradigm: during a single episode, an adversarial operator dynamically alters object states, environmental conditions, and linguistic commands, while the tele-operator adaptively adjusts actions to overcome these evolving challenges. This process compresses diverse failure-recovery behaviors, compositional task variations, and environmental perturbations into minimal demonstrations. Our experiments demonstrate that ADC-trained models achieve superior compositional generalization to unseen task instructions, enhanced robustness to perceptual perturbations, and emergent error recovery capabilities. Strikingly, models trained with merely 20% of the demonstration volume collected through ADC significantly outperform traditional approaches using full datasets. These advances bridge the gap between data-centric learning paradigms and practical robotic deployment, demonstrating that strategic data acquisition, not merely post-hoc processing, is critical for scalable, real-world robot learning. Additionally, we are curating a large-scale ADC-Robotics dataset comprising real-world manipulation tasks with adversarial perturbations. This benchmark will be open-sourced to facilitate advancements in robotic imitation learning.

Adversarial Data Collection: Human-Collaborative Perturbations for Efficient and Robust Robotic Imitation Learning

TL;DR

The paper tackles data efficiency in real-world robotic imitation by introducing Adversarial Data Collection (ADC), a two-person-in-the-loop framework that injects real-time visual and linguistic perturbations to maximize per-demo information content. By formalizing data-unit density for Vision-Language-Action models and integrating adversarial perturbations, ADC substantially improves compositional generalization and robustness, enabling strong performance with only a fraction of traditional data () and yielding broader task coverage. Empirical results on conventional policies and Vision-Language-Action models show enhanced robustness to perceptual perturbations, dynamic environments, and sensor failures, alongside improved attention and observation diversity. The authors also provide an open-source ADC-Robotics dataset to facilitate further research, highlighting a practical shift toward data-quality and human-in-the-loop perturbations as a scalable route to embodied generalization.

Abstract

The pursuit of data efficiency, where quality outweighs quantity, has emerged as a cornerstone in robotic manipulation, especially given the high costs associated with real-world data collection. We propose that maximizing the informational density of individual demonstrations can dramatically reduce reliance on large-scale datasets while improving task performance. To this end, we introduce Adversarial Data Collection, a Human-in-the-Loop (HiL) framework that redefines robotic data acquisition through real-time, bidirectional human-environment interactions. Unlike conventional pipelines that passively record static demonstrations, ADC adopts a collaborative perturbation paradigm: during a single episode, an adversarial operator dynamically alters object states, environmental conditions, and linguistic commands, while the tele-operator adaptively adjusts actions to overcome these evolving challenges. This process compresses diverse failure-recovery behaviors, compositional task variations, and environmental perturbations into minimal demonstrations. Our experiments demonstrate that ADC-trained models achieve superior compositional generalization to unseen task instructions, enhanced robustness to perceptual perturbations, and emergent error recovery capabilities. Strikingly, models trained with merely 20% of the demonstration volume collected through ADC significantly outperform traditional approaches using full datasets. These advances bridge the gap between data-centric learning paradigms and practical robotic deployment, demonstrating that strategic data acquisition, not merely post-hoc processing, is critical for scalable, real-world robot learning. Additionally, we are curating a large-scale ADC-Robotics dataset comprising real-world manipulation tasks with adversarial perturbations. This benchmark will be open-sourced to facilitate advancements in robotic imitation learning.

Paper Structure

This paper contains 20 sections, 2 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Comparative Analysis of the Real-Data Collection Loop in Robotic Manipulation. (a) Traditional Approach: A tele-operator executes tasks via fixed linguistic instructions in static visual environments. (b) Adversarial Data Collection (ADC) Framework: Employs a Two-Humans-in-the-Loop approach, where a secondary operator intervenes to perturb the primary’s execution dynamically when the tele-operator is executing a task. (c) ADC Loop: The adversarial operator introduces visual (backgrounds, object positions/poses) and linguistic (task goals) perturbations, shifting environmental context and target objects within a single episode.
  • Figure 2: The overview of ADC. During training data collection, we introduce several adversarial perturbations, including dynamic visual perturbations and adaptive linguistic challenges. These perturbations increase information density, expand state space coverage, and provide more complete observations of target objects. The resulting high-quality dataset enables the trained policy model to achieve strong robustness and generalization, outperforming models trained with conventional data collection strategies.
  • Figure 3: Hardware setup used in ADC for both data collection and evaluation experiments. The Aloha robot is employed for conventional robotic policy experiments, which include various visual distractors. The AgiBot G1 robot is utilized for the VLA policy experiments, where different dynamic perturbations are applied.
  • Figure 4: Comparison of attention maps when one camera is masked. Models trained with ADC focus more precisely on functional cameras, demonstrating superior attention concentration compared to models trained with traditional data collection pipelines.
  • Figure 5: Comparison of observation coverage for the task "Grasp the orange." In the traditional data collection process, the target object (orange) is observed from similar viewpoints, resulting in limited visual diversity. In contrast, ADC introduces dynamic perturbations, allowing the orange to be observed from a wider range of viewpoints. This leads to greater visual variation in the ADC dataset, improving model robustness and generalization.
  • ...and 2 more figures