Table of Contents
Fetching ...

Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction

Wen Sun, Arun Venkatraman, Geoffrey J. Gordon, Byron Boots, J. Andrew Bagnell

TL;DR

This paper tackles sequential prediction problems by leveraging near-optimal oracles through differentiable imitation learning. It introduces AggreVaTeD, a differentiable IL framework using online and natural gradient updates to exploit an oracle’s cost-to-go, enabling faster and sometimes superior learning compared to RL. The authors provide a rigorous theoretical analysis showing potential exponential and polynomial gaps in sample complexity between IL and RL and demonstrate strong empirical gains across robotics control tasks and a handwriting dependency parsing problem. The work also highlights practical benefits of LSTM-based policies in partially observable settings, supporting the viability of IL for complex, high-dimensional sequential prediction tasks.

Abstract

Researchers have demonstrated state-of-the-art performance in sequential decision making problems (e.g., robotics control, sequential prediction) with deep neural network models. One often has access to near-optimal oracles that achieve good performance on the task during training. We demonstrate that AggreVaTeD --- a policy gradient extension of the Imitation Learning (IL) approach of (Ross & Bagnell, 2014) --- can leverage such an oracle to achieve faster and better solutions with less training data than a less-informed Reinforcement Learning (RL) technique. Using both feedforward and recurrent neural network predictors, we present stochastic gradient procedures on a sequential prediction task, dependency-parsing from raw image data, as well as on various high dimensional robotics control problems. We also provide a comprehensive theoretical study of IL that demonstrates we can expect up to exponentially lower sample complexity for learning with AggreVaTeD than with RL algorithms, which backs our empirical findings. Our results and theory indicate that the proposed approach can achieve superior performance with respect to the oracle when the demonstrator is sub-optimal.

Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction

TL;DR

This paper tackles sequential prediction problems by leveraging near-optimal oracles through differentiable imitation learning. It introduces AggreVaTeD, a differentiable IL framework using online and natural gradient updates to exploit an oracle’s cost-to-go, enabling faster and sometimes superior learning compared to RL. The authors provide a rigorous theoretical analysis showing potential exponential and polynomial gaps in sample complexity between IL and RL and demonstrate strong empirical gains across robotics control tasks and a handwriting dependency parsing problem. The work also highlights practical benefits of LSTM-based policies in partially observable settings, supporting the viability of IL for complex, high-dimensional sequential prediction tasks.

Abstract

Researchers have demonstrated state-of-the-art performance in sequential decision making problems (e.g., robotics control, sequential prediction) with deep neural network models. One often has access to near-optimal oracles that achieve good performance on the task during training. We demonstrate that AggreVaTeD --- a policy gradient extension of the Imitation Learning (IL) approach of (Ross & Bagnell, 2014) --- can leverage such an oracle to achieve faster and better solutions with less training data than a less-informed Reinforcement Learning (RL) technique. Using both feedforward and recurrent neural network predictors, we present stochastic gradient procedures on a sequential prediction task, dependency-parsing from raw image data, as well as on various high dimensional robotics control problems. We also provide a comprehensive theoretical study of IL that demonstrates we can expect up to exponentially lower sample complexity for learning with AggreVaTeD than with RL algorithms, which backs our empirical findings. Our results and theory indicate that the proposed approach can achieve superior performance with respect to the oracle when the demonstrator is sub-optimal.

Paper Structure

This paper contains 29 sections, 7 theorems, 56 equations, 4 figures, 1 table, 1 algorithm.

Key Result

Theorem 5.1

For $\mathcal{M}$, the regret $R_N$ of any finite-horizon, episodic RL algorithm is at least:

Figures (4)

  • Figure 1: The binary tree structure MDP $\tilde{\mathcal{M}}$.
  • Figure 2: Performance (cumulative reward $R$ on y-axis) versus number of episodes ($n$ on x-axis) of AggreVaTeD (blue and green), experts (red), and RL algorithms (dotted) on different robotics simulators.
  • Figure 3: UAS (y-axis) versus number of iterations ($n$ on x-axis) of AggreVaTeD with LSTM policy (blue and green), experts (red) on validation set and test set for Arc-Eager Parsing.
  • Figure 4: An example of a set of handwritten algebra equations (a) and its corresponding dependency tree (b).

Theorems & Definitions (12)

  • Theorem 5.1
  • Theorem 5.2
  • Theorem 5.3
  • Theorem 5.4
  • Theorem 5.5
  • Lemma 3.1
  • Lemma 3.2
  • proof
  • proof
  • proof
  • ...and 2 more