Building reliable sim driving agents by scaling self-play

Daphne Cornelisse; Aarav Pandya; Kevin Joseph; Joseph Suárez; Eugene Vinitsky

Building reliable sim driving agents by scaling self-play

Daphne Cornelisse, Aarav Pandya, Kevin Joseph, Joseph Suárez, Eugene Vinitsky

TL;DR

<3-5 sentence high-level summary> The paper tackles the reliability gap in simulation driving agents by scaling self-play reinforcement learning on a large, real-world driving dataset within a semi-realistic perception framework. Using a GPU-accelerated, data-driven multi-agent simulator and a decentralized PPO setup, the authors demonstrate near-perfect task completion with very low collision and off-road rates across 10k held-out scenes, and show strong generalization when training data is abundant. They also reveal limitations in rare or out-of-distribution scenarios and illustrate rapid adaptation through fine-tuning on small hand-designed sets. By open-sourcing the pre-trained agents and integrating them into a batched simulator, the work provides a practical pathway for scalable, reliable AV simulation and evaluation. The findings have broad implications for safe, automated AV development pipelines and potential extensions to other agent-based modeling domains.

Abstract

Simulation agents are essential for designing and testing systems that interact with humans, such as autonomous vehicles (AVs). These agents serve various purposes, from benchmarking AV performance to stress-testing system limits, but all applications share one key requirement: reliability. To enable sound experimentation, a simulation agent must behave as intended. It should minimize actions that may lead to undesired outcomes, such as collisions, which can distort the signal-to-noise ratio in analyses. As a foundation for reliable sim agents, we propose scaling self-play to thousands of scenarios on the Waymo Open Motion Dataset under semi-realistic limits on human perception and control. Training from scratch on a single GPU, our agents solve almost the full training set within a day. They generalize to unseen test scenes, achieving a 99.8% goal completion rate with less than 0.8% combined collision and off-road incidents across 10,000 held-out scenarios. Beyond in-distribution generalization, our agents show partial robustness to out-of-distribution scenes and can be fine-tuned in minutes to reach near-perfect performance in such cases. We open-source the pre-trained agents and integrate them with a batched multi-agent simulator. Demonstrations of agent behaviors can be viewed at https://sites.google.com/view/reliable-sim-agents, and we open-source our agents at https://github.com/Emerge-Lab/gpudrive.

Building reliable sim driving agents by scaling self-play

TL;DR

Abstract

Building reliable sim driving agents by scaling self-play

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)