Table of Contents
Fetching ...

Task-Relevant Adversarial Imitation Learning

Konrad Zolna, Scott Reed, Alexander Novikov, Sergio Gomez Colmenarejo, David Budden, Serkan Cabi, Misha Denil, Nando de Freitas, Ziyu Wang

TL;DR

The paper identifies a core flaw in adversarial imitation learning: discriminators can exploit spurious visual-feature associations with expert labels, yielding uninformative rewards and poor task performance. It introduces Task-Relevant Adversarial Imitation Learning (TRAIL), which constrains the discriminator using constraining sets to focus on task-relevant information and suppress spurious cues, implemented via an accuracy-based constraint in the discriminator loss. Across pixel-based robotic manipulation tasks, TRAIL outperforms behavioral cloning, conventional GAIL, and off-policy baselines, showing robustness to appearance changes and distractors and demonstrating strong generalization. This approach offers a practical, data-driven path to reliable imitation learning from vision without task rewards, with broad implications for real-world robotics and beyond.

Abstract

We show that a critical vulnerability in adversarial imitation is the tendency of discriminator networks to learn spurious associations between visual features and expert labels. When the discriminator focuses on task-irrelevant features, it does not provide an informative reward signal, leading to poor task performance. We analyze this problem in detail and propose a solution that outperforms standard Generative Adversarial Imitation Learning (GAIL). Our proposed method, Task-Relevant Adversarial Imitation Learning (TRAIL), uses constrained discriminator optimization to learn informative rewards. In comprehensive experiments, we show that TRAIL can solve challenging robotic manipulation tasks from pixels by imitating human operators without access to any task rewards, and clearly outperforms comparable baseline imitation agents, including those trained via behaviour cloning and conventional GAIL.

Task-Relevant Adversarial Imitation Learning

TL;DR

The paper identifies a core flaw in adversarial imitation learning: discriminators can exploit spurious visual-feature associations with expert labels, yielding uninformative rewards and poor task performance. It introduces Task-Relevant Adversarial Imitation Learning (TRAIL), which constrains the discriminator using constraining sets to focus on task-relevant information and suppress spurious cues, implemented via an accuracy-based constraint in the discriminator loss. Across pixel-based robotic manipulation tasks, TRAIL outperforms behavioral cloning, conventional GAIL, and off-policy baselines, showing robustness to appearance changes and distractors and demonstrating strong generalization. This approach offers a practical, data-driven path to reliable imitation learning from vision without task rewards, with broad implications for real-world robotics and beyond.

Abstract

We show that a critical vulnerability in adversarial imitation is the tendency of discriminator networks to learn spurious associations between visual features and expert labels. When the discriminator focuses on task-irrelevant features, it does not provide an informative reward signal, leading to poor task performance. We analyze this problem in detail and propose a solution that outperforms standard Generative Adversarial Imitation Learning (GAIL). Our proposed method, Task-Relevant Adversarial Imitation Learning (TRAIL), uses constrained discriminator optimization to learn informative rewards. In comprehensive experiments, we show that TRAIL can solve challenging robotic manipulation tasks from pixels by imitating human operators without access to any task rewards, and clearly outperforms comparable baseline imitation agents, including those trained via behaviour cloning and conventional GAIL.

Paper Structure

This paper contains 20 sections, 6 equations, 19 figures, 3 tables, 2 algorithms.

Figures (19)

  • Figure 1: The decision boundary generated by a GAIL discriminator based on both task-relevant and spurious features. As the agent improves (more intense red dots), it produces observations closer to the expert w.r.t. task-relevant features. As a result, the discriminator decision boundary must increasingly rely on spurious features.
  • Figure 2: TRAIL agents solving a variety of manipulation tasks, including with distractor objects (b,d). A comparable GAIL agent can solve lifting (a) but fails when distractor objects are added (b-d), as additional objects trigger the formation of spurious associations in the discriminator. A video showing TRAIL and GAIL agents performing these tasks can be watched at \videolink.
  • Figure 3: On the left, we illustrate a trained boundary separated into spurious and task-relevant components. Ideally, to provide informative rewards, our discriminator should only consist of the task-relevant component. On the right, constraining sets $\mathcal{I}_E$ and $\mathcal{I}_A$ are constructed such that only spurious features can be used to discriminate between them, which isolates the spurious decision boundary. Intuitively, our method works by unlearning this spurious boundary, so that the discriminator better captures the task-relevant boundary.
  • Figure 4: Two work spaces, Jaco (left) which uses the Jaco arm and is 20 $\times$ 20 cm, and Sawyer (right) which uses the Sawyer arm and more closely resembles a real robot cage and is 35 $\times$ 35 cm.
  • Figure 5: Results comparing TRAIL, GAIL+AES and baselines for diverse manipulation tasks.
  • ...and 14 more figures