Table of Contents
Fetching ...

InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations

Yunzhu Li, Jiaming Song, Stefano Ermon

TL;DR

<3-5 sentences high-level summary>

Abstract

The goal of imitation learning is to mimic expert behavior without access to an explicit reward signal. Expert demonstrations provided by humans, however, often show significant variability due to latent factors that are typically not explicitly modeled. In this paper, we propose a new algorithm that can infer the latent structure of expert demonstrations in an unsupervised way. Our method, built on top of Generative Adversarial Imitation Learning, can not only imitate complex behaviors, but also learn interpretable and meaningful representations of complex behavioral data, including visual demonstrations. In the driving domain, we show that a model learned from human demonstrations is able to both accurately reproduce a variety of behaviors and accurately anticipate human actions using raw visual inputs. Compared with various baselines, our method can better capture the latent structure underlying expert demonstrations, often recovering semantically meaningful factors of variation in the data.

InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations

TL;DR

<3-5 sentences high-level summary>

Abstract

The goal of imitation learning is to mimic expert behavior without access to an explicit reward signal. Expert demonstrations provided by humans, however, often show significant variability due to latent factors that are typically not explicitly modeled. In this paper, we propose a new algorithm that can infer the latent structure of expert demonstrations in an unsupervised way. Our method, built on top of Generative Adversarial Imitation Learning, can not only imitate complex behaviors, but also learn interpretable and meaningful representations of complex behavioral data, including visual demonstrations. In the driving domain, we show that a model learned from human demonstrations is able to both accurately reproduce a variety of behaviors and accurately anticipate human actions using raw visual inputs. Compared with various baselines, our method can better capture the latent structure underlying expert demonstrations, often recovering semantically meaningful factors of variation in the data.

Paper Structure

This paper contains 24 sections, 5 equations, 7 figures, 2 tables, 2 algorithms.

Figures (7)

  • Figure 1: Learned trajectories in the synthetic 2D plane environment. Each color denotes one specific latent code. Behavior cloning deviates from the expert demonstrations due to compounding errors. GAIL does produce circular trajectories but fails to capture the latent structure for it assumes that the demonstrations are generated from a single expert, and tries to learn an average policy. Our method (InfoGAIL) successfully distinguishes expert behaviors and imitates each mode accordingly (colors are ordered in accordance to the expert for visualization purposes, but are not identifiable).
  • Figure 2: Visualizing the training process of turn. Here we show the trajectories of InfoGAIL at different stages of training. Blue and red indicate policies under different latent codes, which correspond to "turning from inner lane" and "turning from outer lane" respectively. The rightmost figure shows the trajectories under latent codes $[1, 0]$ (red), $[0, 1]$ (blue), and $[0.5, 0.5]$ (purple), which suggests that, to some extent, our method is able to generalize to cases previously unseen in the training data.
  • Figure 3: Experimental results for pass. Left: Trajectories of InfoGAIL at different stages of training (epoch 1 to 37). Blue and red indicate policies using different latent code values, which correspond to passing from right or left. Middle: Traveled distance denotes the absolute distance from the start position, averaged over 60 rollouts of the InfoGAIL policy trained at different epochs. Right: Trajectories of pass produced by an agent trained on the original GAIL objective. Compared to InfoGAIL, GAIL fails to distinguish between different modes.
  • Figure 4: Network architecture for the 2D synthetic environment.
  • Figure 5: Network architecture for the driving experiments.
  • ...and 2 more figures