Table of Contents
Fetching ...

Agile Autonomous Driving using End-to-End Deep Imitation Learning

Yunpeng Pan, Ching-An Cheng, Kamil Saigol, Keuntaek Lee, Xinyan Yan, Evangelos Theodorou, Byron Boots

TL;DR

This work tackles agile, high-speed off-road autonomous driving using only low-cost sensors by learning an end-to-end DNN policy that imitates a model-predictive controller (MPC). It analyzes online versus batch imitation learning, showing that online IL (via DAgger and a mixed-expert data collection) yields better generalization and robustness to covariate shift, enabling safe, high-speed operation on a dirt track. The autonomous driving system combines a CNN for monocular vision with wheel-speed inputs, trained to map observations directly to steering and throttle without state estimation or online planning, and is validated on a 1/5-scale AutoRally platform. The MPC expert relies on a Sparse Spectrum Gaussian Process dynamics model and Differential Dynamic Programming, providing high-quality demonstrations that guide the learner. Overall, the approach demonstrates data-efficient, real-world autonomous driving with low-cost sensors and highlights the importance of online IL for reliability in stochastic, real-world environments.

Abstract

We present an end-to-end imitation learning system for agile, off-road autonomous driving using only low-cost sensors. By imitating a model predictive controller equipped with advanced sensors, we train a deep neural network control policy to map raw, high-dimensional observations to continuous steering and throttle commands. Compared with recent approaches to similar tasks, our method requires neither state estimation nor on-the-fly planning to navigate the vehicle. Our approach relies on, and experimentally validates, recent imitation learning theory. Empirically, we show that policies trained with online imitation learning overcome well-known challenges related to covariate shift and generalize better than policies trained with batch imitation learning. Built on these insights, our autonomous driving system demonstrates successful high-speed off-road driving, matching the state-of-the-art performance.

Agile Autonomous Driving using End-to-End Deep Imitation Learning

TL;DR

This work tackles agile, high-speed off-road autonomous driving using only low-cost sensors by learning an end-to-end DNN policy that imitates a model-predictive controller (MPC). It analyzes online versus batch imitation learning, showing that online IL (via DAgger and a mixed-expert data collection) yields better generalization and robustness to covariate shift, enabling safe, high-speed operation on a dirt track. The autonomous driving system combines a CNN for monocular vision with wheel-speed inputs, trained to map observations directly to steering and throttle without state estimation or online planning, and is validated on a 1/5-scale AutoRally platform. The MPC expert relies on a Sparse Spectrum Gaussian Process dynamics model and Differential Dynamic Programming, providing high-quality demonstrations that guide the learner. Overall, the approach demonstrates data-efficient, real-world autonomous driving with low-cost sensors and highlights the importance of online IL for reliability in stochastic, real-world environments.

Abstract

We present an end-to-end imitation learning system for agile, off-road autonomous driving using only low-cost sensors. By imitating a model predictive controller equipped with advanced sensors, we train a deep neural network control policy to map raw, high-dimensional observations to continuous steering and throttle commands. Compared with recent approaches to similar tasks, our method requires neither state estimation nor on-the-fly planning to navigate the vehicle. Our approach relies on, and experimentally validates, recent imitation learning theory. Empirically, we show that policies trained with online imitation learning overcome well-known challenges related to covariate shift and generalize better than policies trained with batch imitation learning. Built on these insights, our autonomous driving system demonstrates successful high-speed off-road driving, matching the state-of-the-art performance.

Paper Structure

This paper contains 27 sections, 1 theorem, 26 equations, 10 figures, 2 tables.

Key Result

Lemma 1

Define $d_\pi(s,t) = \frac{1}{T} d_{\pi}^t(s)$ as a generalized stationary time-state distribution, where $d_{\pi}^t$ is the distribution of state at time $t$ when running policy $\pi$. Let $\pi$ and $\pi'$ be two policies. Then where $A_{\pi'}^t(s, a) = Q_{\pi'}^t(s,a) - V_{\pi'}^t(s)$ is the (dis)advantage function at time $t$ with respect to running $\pi'$.

Figures (10)

  • Figure 1: The high-speed off-road driving task.
  • Figure 2: System diagram.
  • Figure 3: The DNN control policy.
  • Figure 4: The AutoRally car and the test track.
  • Figure 5: Examples of vehicle trajectories, where online IL avoids the crashing case encountered by batch IL. (b) and (c) depict the test runs after training on 9,000 samples.
  • ...and 5 more figures

Theorems & Definitions (2)

  • Lemma 1
  • Definition 1