Table of Contents
Fetching ...

Imitation Is Not Enough: Robustifying Imitation with Reinforcement Learning for Challenging Driving Scenarios

Yiren Lu, Justin Fu, George Tucker, Xinlei Pan, Eli Bronstein, Rebecca Roelofs, Benjamin Sapp, Brandyn White, Aleksandra Faust, Shimon Whiteson, Dragomir Anguelov, Sergey Levine

TL;DR

The paper addresses safety and reliability gaps in imitation learning for autonomous driving, particularly for rare, high-risk edge cases. It introduces BC-SAC, a hybrid IL+RL approach that uses a simple safety reward and is trained on large-scale real-world urban data (>100k miles). The results show substantial improvements in safety in difficult scenarios while preserving human-like driving behavior, with about a 38–40% reduction in failures compared to IL or RL alone. Extensive analysis, including difficulty-based training and ablations, demonstrates the value of combining imitation with reinforcement learning and training on harder examples to boost robustness and safety in real-world driving tasks.

Abstract

Imitation learning (IL) is a simple and powerful way to use high-quality human driving data, which can be collected at scale, to produce human-like behavior. However, policies based on imitation learning alone often fail to sufficiently account for safety and reliability concerns. In this paper, we show how imitation learning combined with reinforcement learning using simple rewards can substantially improve the safety and reliability of driving policies over those learned from imitation alone. In particular, we train a policy on over 100k miles of urban driving data, and measure its effectiveness in test scenarios grouped by different levels of collision likelihood. Our analysis shows that while imitation can perform well in low-difficulty scenarios that are well-covered by the demonstration data, our proposed approach significantly improves robustness on the most challenging scenarios (over 38% reduction in failures). To our knowledge, this is the first application of a combined imitation and reinforcement learning approach in autonomous driving that utilizes large amounts of real-world human driving data.

Imitation Is Not Enough: Robustifying Imitation with Reinforcement Learning for Challenging Driving Scenarios

TL;DR

The paper addresses safety and reliability gaps in imitation learning for autonomous driving, particularly for rare, high-risk edge cases. It introduces BC-SAC, a hybrid IL+RL approach that uses a simple safety reward and is trained on large-scale real-world urban data (>100k miles). The results show substantial improvements in safety in difficult scenarios while preserving human-like driving behavior, with about a 38–40% reduction in failures compared to IL or RL alone. Extensive analysis, including difficulty-based training and ablations, demonstrates the value of combining imitation with reinforcement learning and training on harder examples to boost robustness and safety in real-world driving tasks.

Abstract

Imitation learning (IL) is a simple and powerful way to use high-quality human driving data, which can be collected at scale, to produce human-like behavior. However, policies based on imitation learning alone often fail to sufficiently account for safety and reliability concerns. In this paper, we show how imitation learning combined with reinforcement learning using simple rewards can substantially improve the safety and reliability of driving policies over those learned from imitation alone. In particular, we train a policy on over 100k miles of urban driving data, and measure its effectiveness in test scenarios grouped by different levels of collision likelihood. Our analysis shows that while imitation can perform well in low-difficulty scenarios that are well-covered by the demonstration data, our proposed approach significantly improves robustness on the most challenging scenarios (over 38% reduction in failures). To our knowledge, this is the first application of a combined imitation and reinforcement learning approach in autonomous driving that utilizes large amounts of real-world human driving data.
Paper Structure (19 sections, 8 equations, 11 figures, 3 tables)

This paper contains 19 sections, 8 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: The demonstration-reward trade-off. As the amount of data for a particular scenario decreases, reward signals become more important for learning. We show a few visual examples representing scenarios with different frequencies.
  • Figure 2: Different objective influence. For in-distribution states, both IL and RL objectives provide learning signal. For out-of-distribution states, the RL objective dominates.
  • Figure 3: Failure rates on the most challenging evaluation sets: Top1 and Top10 (lower is better, with training on All and Top10). BC-SAC consistently achieves the lowest error rates.
  • Figure 4: Failure rates of BC, MGAIL, and BC-SAC across scenarios of varying difficulty levels (50%-100%, lower is better). While all methods perform worse as the evaluation dataset becomes more challenging, BC-SAC always performs best and shows the least degradation.
  • Figure 5: Marginal action distributions. SAC/BC-SAC (orange) vs logs (blue).
  • ...and 6 more figures