Table of Contents
Fetching ...

Query-Efficient Imitation Learning for End-to-End Autonomous Driving

Jiakai Zhang, Kyunghyun Cho

TL;DR

The paper addresses the high query cost and safety concerns of imitation-learning for end-to-end autonomous driving. It introduces SafeDAgger, a query-efficient extension of DAgger that incorporates a learned safety policy to decide when to rely on the primary policy versus a reference policy, enabling automated curriculum-like learning. Empirical results in a TORCS racing simulator show that SafeDAgger dramatically reduces reference-policy queries and accelerates convergence while maintaining or improving driving safety and performance. This approach offers a practical path toward scalable, safe imitation learning for real-world autonomous driving scenarios, with potential extensions to reinforcement learning and other learning-to-search algorithms.

Abstract

One way to approach end-to-end autonomous driving is to learn a policy function that maps from a sensory input, such as an image frame from a front-facing camera, to a driving action, by imitating an expert driver, or a reference policy. This can be done by supervised learning, where a policy function is tuned to minimize the difference between the predicted and ground-truth actions. A policy function trained in this way however is known to suffer from unexpected behaviours due to the mismatch between the states reachable by the reference policy and trained policy functions. More advanced algorithms for imitation learning, such as DAgger, addresses this issue by iteratively collecting training examples from both reference and trained policies. These algorithms often requires a large number of queries to a reference policy, which is undesirable as the reference policy is often expensive. In this paper, we propose an extension of the DAgger, called SafeDAgger, that is query-efficient and more suitable for end-to-end autonomous driving. We evaluate the proposed SafeDAgger in a car racing simulator and show that it indeed requires less queries to a reference policy. We observe a significant speed up in convergence, which we conjecture to be due to the effect of automated curriculum learning.

Query-Efficient Imitation Learning for End-to-End Autonomous Driving

TL;DR

The paper addresses the high query cost and safety concerns of imitation-learning for end-to-end autonomous driving. It introduces SafeDAgger, a query-efficient extension of DAgger that incorporates a learned safety policy to decide when to rely on the primary policy versus a reference policy, enabling automated curriculum-like learning. Empirical results in a TORCS racing simulator show that SafeDAgger dramatically reduces reference-policy queries and accelerates convergence while maintaining or improving driving safety and performance. This approach offers a practical path toward scalable, safe imitation learning for real-world autonomous driving scenarios, with potential extensions to reinforcement learning and other learning-to-search algorithms.

Abstract

One way to approach end-to-end autonomous driving is to learn a policy function that maps from a sensory input, such as an image frame from a front-facing camera, to a driving action, by imitating an expert driver, or a reference policy. This can be done by supervised learning, where a policy function is tuned to minimize the difference between the predicted and ground-truth actions. A policy function trained in this way however is known to suffer from unexpected behaviours due to the mismatch between the states reachable by the reference policy and trained policy functions. More advanced algorithms for imitation learning, such as DAgger, addresses this issue by iteratively collecting training examples from both reference and trained policies. These algorithms often requires a large number of queries to a reference policy, which is undesirable as the reference policy is often expensive. In this paper, we propose an extension of the DAgger, called SafeDAgger, that is query-efficient and more suitable for end-to-end autonomous driving. We evaluate the proposed SafeDAgger in a car racing simulator and show that it indeed requires less queries to a reference policy. We observe a significant speed up in convergence, which we conjecture to be due to the effect of automated curriculum learning.

Paper Structure

This paper contains 39 sections, 7 equations, 6 figures, 1 table, 1 algorithm.

Figures (6)

  • Figure 1: The histogram of the $\log$ square errors of steering angle after supervised learning only. The dashed line is located at $\tau=0.0025$. $77.70\%$ of the training examples are considered safe.
  • Figure 2: (a) Average number of laps ($\uparrow$), (b) damage per lap ($\downarrow$) and (c) the mean squared error of steering angle for each configuration (training strategy--driving strategy) over the iterations. We use solid and dashed curves for the cases without and with traffic, respectively.
  • Figure 3: The portion of time driven by a reference policy during test. We see a clear downward trend as the iteration continues.
  • Figure 4: Training and test tracks with sample frames.
  • Figure 5: The configuration of a primary policy network. Each convolutional layer is denoted by "Conv - # channels $\times$ height $\times$ width". Max pooling without overlap follows each convolutional layer. We use rectified linear units nair2010rectifiedglorot2011deep for point-wise nonlinearities. Only the shaded part of the full network is used during test.
  • ...and 1 more figures