Table of Contents
Fetching ...

CIMRL: Combining IMitation and Reinforcement Learning for Safe Autonomous Driving

Jonathan Booher, Khashayar Rohanimanesh, Junhong Xu, Vladislav Isenbaev, Ashwin Balakrishna, Ishan Gupta, Wei Liu, Aleksandr Petiushko

TL;DR

This paper proposes Combining IMitation and Reinforcement Learning (CIMRL) approach - a safe reinforcement learning framework that enables training driving policies in simulation through leveraging imitative motion priors and safety constraints and achieves state-of-the-art results in closed loop simulation and real world driving benchmarks.

Abstract

Modern approaches to autonomous driving rely heavily on learned components trained with large amounts of human driving data via imitation learning. However, these methods require large amounts of expensive data collection and even then face challenges with safely handling long-tail scenarios and compounding errors over time. At the same time, pure Reinforcement Learning (RL) methods can fail to learn performant policies in sparse, constrained, and challenging-to-define reward settings such as autonomous driving. Both of these challenges make deploying purely cloned or pure RL policies in safety critical applications such as autonomous vehicles challenging. In this paper we propose Combining IMitation and Reinforcement Learning (CIMRL) approach - a safe reinforcement learning framework that enables training driving policies in simulation through leveraging imitative motion priors and safety constraints. CIMRL does not require extensive reward specification and improves on the closed loop behavior of pure cloning methods. By combining RL and imitation, we demonstrate that our method achieves state-of-the-art results in closed loop simulation and real world driving benchmarks.

CIMRL: Combining IMitation and Reinforcement Learning for Safe Autonomous Driving

TL;DR

This paper proposes Combining IMitation and Reinforcement Learning (CIMRL) approach - a safe reinforcement learning framework that enables training driving policies in simulation through leveraging imitative motion priors and safety constraints and achieves state-of-the-art results in closed loop simulation and real world driving benchmarks.

Abstract

Modern approaches to autonomous driving rely heavily on learned components trained with large amounts of human driving data via imitation learning. However, these methods require large amounts of expensive data collection and even then face challenges with safely handling long-tail scenarios and compounding errors over time. At the same time, pure Reinforcement Learning (RL) methods can fail to learn performant policies in sparse, constrained, and challenging-to-define reward settings such as autonomous driving. Both of these challenges make deploying purely cloned or pure RL policies in safety critical applications such as autonomous vehicles challenging. In this paper we propose Combining IMitation and Reinforcement Learning (CIMRL) approach - a safe reinforcement learning framework that enables training driving policies in simulation through leveraging imitative motion priors and safety constraints. CIMRL does not require extensive reward specification and improves on the closed loop behavior of pure cloning methods. By combining RL and imitation, we demonstrate that our method achieves state-of-the-art results in closed loop simulation and real world driving benchmarks.
Paper Structure (17 sections, 11 equations, 1 figure, 2 tables, 1 algorithm)

This paper contains 17 sections, 11 equations, 1 figure, 2 tables, 1 algorithm.

Figures (1)

  • Figure 1: Illustration of the CIMRL algorithm. The model combines imitation learning with safe reinforcement learning by restricting the action space to an efficient support derived from the motion prior generated by a pretrained imitation learning model. State and action are encoded via deep neural networks, concatenated, and used to predict both task and risk values. The model is initially trained in simulation and then deployed in real-world environments, ensuring robust and scalable performance. We use the $Q_{risk}$ estimation to identify safe actions based on a risk threshold. If such actions exist, we use the task policy $\pi_{task}$ to select exclusively from the safe actions. However, should there be no safe actions available, we fall back to using the recovery policy $\pi_{recov}$ which is optimized to guide the agent back to a safe state.