Deep Imitative Models for Flexible Inference, Planning, and Control
Nicholas Rhinehart, Rowan McAllister, Sergey Levine
TL;DR
The paper proposes Deep Imitative Models that merge imitation learning with goal directed planning by learning a trajectory density $q(\\mathbf{S}_{1:T}|\\phi)$ from expert demonstrations and performing posterior inference against a flexible goal likelihood $p(\\mathcal{G}|\\mathbf{s},\\phi)$. Planning is formulated as a MAP problem that maximizes $\\log q(\\mathbf{S}|\\phi) + \\log p(\\mathcal{G}|\\mathbf{S},\\phi)$, enabling multi step expert like trajectories toward novel goals. The approach supports multiple goal likelihoods including constrained and unconstrained forms and can incorporate test time costs, achieving state of the art performance in CARLA driving while remaining robust to mis specified goals and unseen obstacles. This offline data efficient method offers interpretable planning, broad applicability to autonomous control tasks, and potential safety benefits in real world deployment.
Abstract
Imitation Learning (IL) is an appealing approach to learn desirable autonomous behavior. However, directing IL to achieve arbitrary goals is difficult. In contrast, planning-based algorithms use dynamics models and reward functions to achieve goals. Yet, reward functions that evoke desirable behavior are often difficult to specify. In this paper, we propose Imitative Models to combine the benefits of IL and goal-directed planning. Imitative Models are probabilistic predictive models of desirable behavior able to plan interpretable expert-like trajectories to achieve specified goals. We derive families of flexible goal objectives, including constrained goal regions, unconstrained goal sets, and energy-based goals. We show that our method can use these objectives to successfully direct behavior. Our method substantially outperforms six IL approaches and a planning-based approach in a dynamic simulated autonomous driving task, and is efficiently learned from expert demonstrations without online data collection. We also show our approach is robust to poorly specified goals, such as goals on the wrong side of the road.
