Sample-Efficient Imitation Learning via Generative Adversarial Nets

Lionel Blondé; Alexandros Kalousis

Sample-Efficient Imitation Learning via Generative Adversarial Nets

Lionel Blondé, Alexandros Kalousis

TL;DR

The paper addresses the high sample complexity of Generative Adversarial Imitation Learning (GAIL) by introducing Sam, a Sample-efficient Adversarial Imitation Learning framework. Sam uses an off-policy, TD-based approach with deterministic policies and a triad of interacting modules: a discriminator-based reward, a critic, and a policy, all trained with a replay buffer to reuse past experience. By combining a gradient from the learned reward with TD-based policy evaluation and a carefully designed exploration strategy, Sam substantially reduces the number of environment interactions needed to achieve expert-like performance while maintaining stability. This has practical implications for real-world robotics and other domains where costly or risky environment interactions are a bottleneck, while preserving model-free training and adversarial intuition fidelity.

Abstract

GAIL is a recent successful imitation learning architecture that exploits the adversarial training procedure introduced in GANs. Albeit successful at generating behaviours similar to those demonstrated to the agent, GAIL suffers from a high sample complexity in the number of interactions it has to carry out in the environment in order to achieve satisfactory performance. We dramatically shrink the amount of interactions with the environment necessary to learn well-behaved imitation policies, by up to several orders of magnitude. Our framework, operating in the model-free regime, exhibits a significant increase in sample-efficiency over previous methods by simultaneously a) learning a self-tuned adversarially-trained surrogate reward and b) leveraging an off-policy actor-critic architecture. We show that our approach is simple to implement and that the learned agents remain remarkably stable, as shown in our experiments that span a variety of continuous control tasks. Video visualisations available at: \url{https://youtu.be/-nCsqUJnRKU}.

Sample-Efficient Imitation Learning via Generative Adversarial Nets

TL;DR

Abstract

Sample-Efficient Imitation Learning via Generative Adversarial Nets

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)