Table of Contents
Fetching ...

Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous Driving Research

Cole Gulino, Justin Fu, Wenjie Luo, George Tucker, Eli Bronstein, Yiren Lu, Jean Harb, Xinlei Pan, Yan Wang, Xiangyu Chen, John D. Co-Reyes, Rishabh Agarwal, Rebecca Roelofs, Yao Lu, Nico Montali, Paul Mougin, Zoey Yang, Brandyn White, Aleksandra Faust, Rowan McAllister, Dragomir Anguelov, Benjamin Sapp

TL;DR

Waymax addresses the need for fast, realistic multi-agent autonomous driving simulation by leveraging real-world WOMD data and hardware-accelerated, differentiable in-graph computation. It provides a modular, Gym-like API with both multi-agent and ego-planning workflows, and benchmarks a range of IL/RL baselines while offering route-conditioned metrics and ablations. Key contributions include the data-driven scenario initialization, differentiable simulation, in-graph training compatibility, and a comprehensive benchmarking suite that reveals trade-offs between route guidance and agent interactions. The approach promises scalable, more realistic planning research and highlights future work to close the sim-to-real gap through techniques like domain randomization and hybrid data usage.

Abstract

Simulation is an essential tool to develop and benchmark autonomous vehicle planning software in a safe and cost-effective manner. However, realistic simulation requires accurate modeling of nuanced and complex multi-agent interactive behaviors. To address these challenges, we introduce Waymax, a new data-driven simulator for autonomous driving in multi-agent scenes, designed for large-scale simulation and testing. Waymax uses publicly-released, real-world driving data (e.g., the Waymo Open Motion Dataset) to initialize or play back a diverse set of multi-agent simulated scenarios. It runs entirely on hardware accelerators such as TPUs/GPUs and supports in-graph simulation for training, making it suitable for modern large-scale, distributed machine learning workflows. To support online training and evaluation, Waymax includes several learned and hard-coded behavior models that allow for realistic interaction within simulation. To supplement Waymax, we benchmark a suite of popular imitation and reinforcement learning algorithms with ablation studies on different design decisions, where we highlight the effectiveness of routes as guidance for planning agents and the ability of RL to overfit against simulated agents.

Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous Driving Research

TL;DR

Waymax addresses the need for fast, realistic multi-agent autonomous driving simulation by leveraging real-world WOMD data and hardware-accelerated, differentiable in-graph computation. It provides a modular, Gym-like API with both multi-agent and ego-planning workflows, and benchmarks a range of IL/RL baselines while offering route-conditioned metrics and ablations. Key contributions include the data-driven scenario initialization, differentiable simulation, in-graph training compatibility, and a comprehensive benchmarking suite that reveals trade-offs between route guidance and agent interactions. The approach promises scalable, more realistic planning research and highlights future work to close the sim-to-real gap through techniques like domain randomization and hybrid data usage.

Abstract

Simulation is an essential tool to develop and benchmark autonomous vehicle planning software in a safe and cost-effective manner. However, realistic simulation requires accurate modeling of nuanced and complex multi-agent interactive behaviors. To address these challenges, we introduce Waymax, a new data-driven simulator for autonomous driving in multi-agent scenes, designed for large-scale simulation and testing. Waymax uses publicly-released, real-world driving data (e.g., the Waymo Open Motion Dataset) to initialize or play back a diverse set of multi-agent simulated scenarios. It runs entirely on hardware accelerators such as TPUs/GPUs and supports in-graph simulation for training, making it suitable for modern large-scale, distributed machine learning workflows. To support online training and evaluation, Waymax includes several learned and hard-coded behavior models that allow for realistic interaction within simulation. To supplement Waymax, we benchmark a suite of popular imitation and reinforcement learning algorithms with ablation studies on different design decisions, where we highlight the effectiveness of routes as guidance for planning agents and the ability of RL to overfit against simulated agents.
Paper Structure (40 sections, 4 equations, 5 figures, 6 tables)

This paper contains 40 sections, 4 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Two examples demonstrating the types of interactive, urban driving scenarios available in Waymax. (a) shows a vehicle waiting for oncoming traffic to pass before turning into a narrow street. (b) shows an agent performing an left turn at a 4-way intersection while following a route (boundaries highlighted in green).
  • Figure 2: A sample of features available in Waymax. a): The routes given to an agent (all areas highlighted in color) are computed by combining the logged future trajectory of the agent with all possible future routes after the logged trajectory. b): Waymax is bundled with reactive simulated agents. Here, agent #5 (circled in red) is stopped in front of an intersection, causing the IDM-controlled agents (#1, 2, 3, and 6) to brake in order to avoid collision.
  • Figure 3: An illustration of a simulation rollout using reactive simulated agents to control non-AV agents, and a user-defined policy to control the AV.
  • Figure 4: Runtime in milliseconds (y-axis) plotted against number of objects simulated (x-axis). The runtime reported is the sum of Reset + Transition + Metrics. Note that while CPU runtime scales linearly with the number of objects simulated, GPU performance is not saturated under the same experimental parameters.
  • Figure 5: Memory usage in megabytes (y-axis) plotted against number of objects simulated (x-axis). The runtime reported is sampled during the execution of the rollout function. Memory usage has a fixed cost then scales roughly linearly with the number of objects