Table of Contents
Fetching ...

An Algorithmic Perspective on Imitation Learning

Takayuki Osa, Joni Pajarinen, Gerhard Neumann, J. Andrew Bagnell, Pieter Abbeel, Jan Peters

TL;DR

This work provides an introduction to imitation learning, dividing imitation learning into directly replicating desired behavior and learning the hidden objectives of the desired behavior from demonstrations (called inverse optimal control or inverse reinforcement learning [Russell, 1998]).

Abstract

As robots and other intelligent agents move from simple environments and problems to more complex, unstructured settings, manually programming their behavior has become increasingly challenging and expensive. Often, it is easier for a teacher to demonstrate a desired behavior rather than attempt to manually engineer it. This process of learning from demonstrations, and the study of algorithms to do so, is called imitation learning. This work provides an introduction to imitation learning. It covers the underlying assumptions, approaches, and how they relate; the rich set of algorithms developed to tackle the problem; and advice on effective tools and implementation. We intend this paper to serve two audiences. First, we want to familiarize machine learning experts with the challenges of imitation learning, particularly those arising in robotics, and the interesting theoretical and practical distinctions between it and more familiar frameworks like statistical supervised learning theory and reinforcement learning. Second, we want to give roboticists and experts in applied artificial intelligence a broader appreciation for the frameworks and tools available for imitation learning.

An Algorithmic Perspective on Imitation Learning

TL;DR

This work provides an introduction to imitation learning, dividing imitation learning into directly replicating desired behavior and learning the hidden objectives of the desired behavior from demonstrations (called inverse optimal control or inverse reinforcement learning [Russell, 1998]).

Abstract

As robots and other intelligent agents move from simple environments and problems to more complex, unstructured settings, manually programming their behavior has become increasingly challenging and expensive. Often, it is easier for a teacher to demonstrate a desired behavior rather than attempt to manually engineer it. This process of learning from demonstrations, and the study of algorithms to do so, is called imitation learning. This work provides an introduction to imitation learning. It covers the underlying assumptions, approaches, and how they relate; the rich set of algorithms developed to tackle the problem; and advice on effective tools and implementation. We intend this paper to serve two audiences. First, we want to familiarize machine learning experts with the challenges of imitation learning, particularly those arising in robotics, and the interesting theoretical and practical distinctions between it and more familiar frameworks like statistical supervised learning theory and reinforcement learning. Second, we want to give roboticists and experts in applied artificial intelligence a broader appreciation for the frameworks and tools available for imitation learning.

Paper Structure

This paper contains 132 sections, 157 equations, 60 figures, 26 tables.

Figures (60)

  • Figure 1: Observations$\boldsymbol{y}$ and control inputs $\boldsymbol{u}$ for imitation learning in (a) helicopter flight, (b) surgery, and (c) locomotion. Motion planning is formulated in different ways in these examples.
  • Figure 2: Illustration of I- and M- projections. Given a distribution with two modes as shown in black, M-projection will give a solution that averages over two modes as shown in red. On the contrary, I-projection will give a solution that concentrates on one of the modes.
  • Figure 3: A ski jumper flies through the air using the highly aerodynamic "Vstyle". "V-style" was adopted by most ski jumpers in the 1990s after some jumpers demonstrated impressive results with the style (public domain picture from Wikimedia Commons).
  • Figure 4: Diagram of general imitation learning. The learner cannot directly observe the expert's policy in many problems. Instead, a set of trajectories induced by the expert's policy is available in imitation learning. The learner estimates the policy that reproduces the expert's behavior using the given demonstrations. Please note that the process of querying the demonstration and updating the learner's policy can be interactive.
  • Figure 5: Illustration of the relationships between basic policy classes. Stationarity is a special case of non-stationarity and determinism is a special case of stochasticity. We use the terms "stationary" and "time-invariant" interchangeably. Likewise, "non-stationary" and "time-variant" are used interchangeably. Please see § 2.5.4 for more details.
  • ...and 55 more figures