CIMRL: Combining IMitation and Reinforcement Learning for Safe Autonomous Driving

Jonathan Booher; Khashayar Rohanimanesh; Junhong Xu; Vladislav Isenbaev; Ashwin Balakrishna; Ishan Gupta; Wei Liu; Aleksandr Petiushko

CIMRL: Combining IMitation and Reinforcement Learning for Safe Autonomous Driving

Jonathan Booher, Khashayar Rohanimanesh, Junhong Xu, Vladislav Isenbaev, Ashwin Balakrishna, Ishan Gupta, Wei Liu, Aleksandr Petiushko

TL;DR

This paper proposes Combining IMitation and Reinforcement Learning (CIMRL) approach - a safe reinforcement learning framework that enables training driving policies in simulation through leveraging imitative motion priors and safety constraints and achieves state-of-the-art results in closed loop simulation and real world driving benchmarks.

Abstract

Modern approaches to autonomous driving rely heavily on learned components trained with large amounts of human driving data via imitation learning. However, these methods require large amounts of expensive data collection and even then face challenges with safely handling long-tail scenarios and compounding errors over time. At the same time, pure Reinforcement Learning (RL) methods can fail to learn performant policies in sparse, constrained, and challenging-to-define reward settings such as autonomous driving. Both of these challenges make deploying purely cloned or pure RL policies in safety critical applications such as autonomous vehicles challenging. In this paper we propose Combining IMitation and Reinforcement Learning (CIMRL) approach - a safe reinforcement learning framework that enables training driving policies in simulation through leveraging imitative motion priors and safety constraints. CIMRL does not require extensive reward specification and improves on the closed loop behavior of pure cloning methods. By combining RL and imitation, we demonstrate that our method achieves state-of-the-art results in closed loop simulation and real world driving benchmarks.

CIMRL: Combining IMitation and Reinforcement Learning for Safe Autonomous Driving

TL;DR

Abstract

Paper Structure (17 sections, 11 equations, 1 figure, 2 tables, 1 algorithm)

This paper contains 17 sections, 11 equations, 1 figure, 2 tables, 1 algorithm.

Introduction
Related Work
Combining IMitiation and Reinforcement Learning (CIMRL)
CIMRL Model
Actions
Suppressed Task Value
D-SAC with Tree Backup
Model Architecture
Encoder
Decoder
Model Training
Experiments
Waymax
Setup
Result
...and 2 more sections

Figures (1)

Figure 1: Illustration of the CIMRL algorithm. The model combines imitation learning with safe reinforcement learning by restricting the action space to an efficient support derived from the motion prior generated by a pretrained imitation learning model. State and action are encoded via deep neural networks, concatenated, and used to predict both task and risk values. The model is initially trained in simulation and then deployed in real-world environments, ensuring robust and scalable performance. We use the $Q_{risk}$ estimation to identify safe actions based on a risk threshold. If such actions exist, we use the task policy $\pi_{task}$ to select exclusively from the safe actions. However, should there be no safe actions available, we fall back to using the recovery policy $\pi_{recov}$ which is optimized to guide the agent back to a safe state.

CIMRL: Combining IMitation and Reinforcement Learning for Safe Autonomous Driving

TL;DR

Abstract

CIMRL: Combining IMitation and Reinforcement Learning for Safe Autonomous Driving

Authors

TL;DR

Abstract

Table of Contents

Figures (1)