Table of Contents
Fetching ...

DriverGym: Democratising Reinforcement Learning for Autonomous Driving

Parth Kothari, Christian Perone, Luca Bergamini, Alexandre Alahi, Peter Ondruska

TL;DR

Autonomous driving RL is hampered by safety concerns and limited access to real-world data for training and validation. The authors introduce DriverGym, an OpenAI Gym–compatible environment built on 1,000+ hours of Level 5 logs, enabling training and evaluation of RL policies with reactive, data-driven surrounding agents. They offer an extensible closed-loop evaluation protocol, pre-trained models, and reproducible training code to spur community development. Experimental comparisons among supervised imitation (SL), perturbations (SL+P), and PPO demonstrate trade-offs between displacement and collision metrics, illustrating the feasibility and challenges of RL-based planning using real-world data.

Abstract

Despite promising progress in reinforcement learning (RL), developing algorithms for autonomous driving (AD) remains challenging: one of the critical issues being the absence of an open-source platform capable of training and effectively validating the RL policies on real-world data. We propose DriverGym, an open-source OpenAI Gym-compatible environment specifically tailored for developing RL algorithms for autonomous driving. DriverGym provides access to more than 1000 hours of expert logged data and also supports reactive and data-driven agent behavior. The performance of an RL policy can be easily validated on real-world data using our extensive and flexible closed-loop evaluation protocol. In this work, we also provide behavior cloning baselines using supervised learning and RL, trained in DriverGym. We make DriverGym code, as well as all the baselines publicly available to further stimulate development from the community.

DriverGym: Democratising Reinforcement Learning for Autonomous Driving

TL;DR

Autonomous driving RL is hampered by safety concerns and limited access to real-world data for training and validation. The authors introduce DriverGym, an OpenAI Gym–compatible environment built on 1,000+ hours of Level 5 logs, enabling training and evaluation of RL policies with reactive, data-driven surrounding agents. They offer an extensible closed-loop evaluation protocol, pre-trained models, and reproducible training code to spur community development. Experimental comparisons among supervised imitation (SL), perturbations (SL+P), and PPO demonstrate trade-offs between displacement and collision metrics, illustrating the feasibility and challenges of RL-based planning using real-world data.

Abstract

Despite promising progress in reinforcement learning (RL), developing algorithms for autonomous driving (AD) remains challenging: one of the critical issues being the absence of an open-source platform capable of training and effectively validating the RL policies on real-world data. We propose DriverGym, an open-source OpenAI Gym-compatible environment specifically tailored for developing RL algorithms for autonomous driving. DriverGym provides access to more than 1000 hours of expert logged data and also supports reactive and data-driven agent behavior. The performance of an RL policy can be easily validated on real-world data using our extensive and flexible closed-loop evaluation protocol. In this work, we also provide behavior cloning baselines using supervised learning and RL, trained in DriverGym. We make DriverGym code, as well as all the baselines publicly available to further stimulate development from the community.

Paper Structure

This paper contains 18 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: DriverGym: an open-source gym environment that enables training RL driving policies on real-world data. The RL policy can access rich semantic maps to control the ego (red). Other agents (blue) can either be simulated from the data logs or controlled using a dedicated policy pre-trained on real-world data. We provide an extensible evaluation system (purple) with easily configurable metrics to evaluate the idiosyncrasies of the trained policies.
  • Figure 2: Visualization of an episode rollout (ego in red, agents in blue) in DriverGym. The policy prediction (green line) is scaled by factor of 10 and shown at 2 second intervals for better viewing.
  • Figure 3: Example Rasterization Modes
  • Figure 4: Evaluation Plan