Model Predictive Adversarial Imitation Learning for Planning from Observation

Tyler Han; Yanda Bao; Bhaumik Mehta; Gabriel Guo; Anubhav Vishwakarma; Emily Kang; Sanghun Jung; Rosario Scalise; Jason Zhou; Bryan Xu; Byron Boots

Model Predictive Adversarial Imitation Learning for Planning from Observation

Tyler Han, Yanda Bao, Bhaumik Mehta, Gabriel Guo, Anubhav Vishwakarma, Emily Kang, Sanghun Jung, Rosario Scalise, Jason Zhou, Bryan Xu, Byron Boots

TL;DR

This study derives a replacement of the policy in IRL with a planning-based agent and enables end-to-end interactive learning of planners from observation-only demonstrations, and study and observe significant improvements on sample efficiency, out-of-distribution generalization, and robustness.

Abstract

Human demonstration data is often ambiguous and incomplete, motivating imitation learning approaches that also exhibit reliable planning behavior. A common paradigm to perform planning-from-demonstration involves learning a reward function via Inverse Reinforcement Learning (IRL) then deploying this reward via Model Predictive Control (MPC). Towards unifying these methods, we derive a replacement of the policy in IRL with a planning-based agent. With connections to Adversarial Imitation Learning, this formulation enables end-to-end interactive learning of planners from observation-only demonstrations. In addition to benefits in interpretability, complexity, and safety, we study and observe significant improvements on sample efficiency, out-of-distribution generalization, and robustness. The study includes evaluations in both simulated control benchmarks and real-world navigation experiments using few-to-single observation-only demonstrations.

Model Predictive Adversarial Imitation Learning for Planning from Observation

TL;DR

Abstract

Model Predictive Adversarial Imitation Learning for Planning from Observation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)

Theorems & Definitions (5)