Scaling Is All You Need: Autonomous Driving with JAX-Accelerated Reinforcement Learning

Moritz Harmel; Anubhav Paras; Andreas Pasternak; Nicholas Roy; Gary Linscott

Scaling Is All You Need: Autonomous Driving with JAX-Accelerated Reinforcement Learning

Moritz Harmel, Anubhav Paras, Andreas Pasternak, Nicholas Roy, Gary Linscott

TL;DR

This work demonstrates that the best performing policy reduces the failure rate by 64% while improving the rate of driving progress by 25% compared to the policies produced by state-of-the-art machine learning for autonomous driving.

Abstract

Reinforcement learning has been demonstrated to outperform even the best humans in complex domains like video games. However, running reinforcement learning experiments on the required scale for autonomous driving is extremely difficult. Building a large scale reinforcement learning system and distributing it across many GPUs is challenging. Gathering experience during training on real world vehicles is prohibitive from a safety and scalability perspective. Therefore, an efficient and realistic driving simulator is required that uses a large amount of data from real-world driving. We bring these capabilities together and conduct large-scale reinforcement learning experiments for autonomous driving. We demonstrate that our policy performance improves with increasing scale. Our best performing policy reduces the failure rate by 64% while improving the rate of driving progress by 25% compared to the policies produced by state-of-the-art machine learning for autonomous driving.

Scaling Is All You Need: Autonomous Driving with JAX-Accelerated Reinforcement Learning

TL;DR

Abstract

Paper Structure (23 sections, 4 figures, 4 tables)

This paper contains 23 sections, 4 figures, 4 tables.

Introduction
Related work
Large scale RL for autonomous driving
Scene generation and agent interactions
Accelerated autonomous driving simulator
Preparing data for parallel execution
The accelerated simulation utilizing JAX
Simulator performance benchmark
RL problem formulation
Observation space
Action space
Rewards
Distributed learning system
Evaluation
Metrics
...and 8 more sections

Figures (4)

Figure 1: Results for experiments with different model sizes (rows) and dataset sizes (columns). Colors represent the numerical results on color scales. (a) The performance of the policy improves with increasing model and dataset size. (b) The model size is the major driver of the required GPU time and therefore cost of training. Dataset size has no effect on the training time, but it can affect one time costs during data preprocessing which is not considered here.
Figure 2: Route and road network observations obtained from the roads library.
Figure 3: Examples of conservative detection of collisions and off-route events.
Figure 4: Training curves for experiments with and without the SL pre-training on the 2.5M parameter model and the 6000 h dataset.

Scaling Is All You Need: Autonomous Driving with JAX-Accelerated Reinforcement Learning

TL;DR

Abstract

Scaling Is All You Need: Autonomous Driving with JAX-Accelerated Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)