Table of Contents
Fetching ...

Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments

Luke Rowe, Roger Girgis, Anthony Gosselin, Liam Paull, Christopher Pal, Felix Heide

TL;DR

Scenario Dreamer tackles the need for scalable, realistic autonomous driving simulators by introducing a fully data-driven, vectorized framework. It couples a vectorized latent diffusion model for initial scene generation with a return-conditioned Transformer for closed-loop behavior, enabling unbounded, diffusion-based scene extrapolation and long-horizon rollouts. The approach achieves higher fidelity and efficiency than raster-based baselines, and yields more challenging environments for RL planners, demonstrating practical value for planning research and safety evaluation. Collectively, it provides a scalable, vector-based generator and a controllable, data-driven behavior model that together advance simulation-based autonomous driving research and development.

Abstract

We introduce Scenario Dreamer, a fully data-driven generative simulator for autonomous vehicle planning that generates both the initial traffic scene - comprising a lane graph and agent bounding boxes - and closed-loop agent behaviours. Existing methods for generating driving simulation environments encode the initial traffic scene as a rasterized image and, as such, require parameter-heavy networks that perform unnecessary computation due to many empty pixels in the rasterized scene. Moreover, we find that existing methods that employ rule-based agent behaviours lack diversity and realism. Scenario Dreamer instead employs a novel vectorized latent diffusion model for initial scene generation that directly operates on the vectorized scene elements and an autoregressive Transformer for data-driven agent behaviour simulation. Scenario Dreamer additionally supports scene extrapolation via diffusion inpainting, enabling the generation of unbounded simulation environments. Extensive experiments show that Scenario Dreamer outperforms existing generative simulators in realism and efficiency: the vectorized scene-generation base model achieves superior generation quality with around 2x fewer parameters, 6x lower generation latency, and 10x fewer GPU training hours compared to the strongest baseline. We confirm its practical utility by showing that reinforcement learning planning agents are more challenged in Scenario Dreamer environments than traditional non-generative simulation environments, especially on long and adversarial driving environments.

Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments

TL;DR

Scenario Dreamer tackles the need for scalable, realistic autonomous driving simulators by introducing a fully data-driven, vectorized framework. It couples a vectorized latent diffusion model for initial scene generation with a return-conditioned Transformer for closed-loop behavior, enabling unbounded, diffusion-based scene extrapolation and long-horizon rollouts. The approach achieves higher fidelity and efficiency than raster-based baselines, and yields more challenging environments for RL planners, demonstrating practical value for planning research and safety evaluation. Collectively, it provides a scalable, vector-based generator and a controllable, data-driven behavior model that together advance simulation-based autonomous driving research and development.

Abstract

We introduce Scenario Dreamer, a fully data-driven generative simulator for autonomous vehicle planning that generates both the initial traffic scene - comprising a lane graph and agent bounding boxes - and closed-loop agent behaviours. Existing methods for generating driving simulation environments encode the initial traffic scene as a rasterized image and, as such, require parameter-heavy networks that perform unnecessary computation due to many empty pixels in the rasterized scene. Moreover, we find that existing methods that employ rule-based agent behaviours lack diversity and realism. Scenario Dreamer instead employs a novel vectorized latent diffusion model for initial scene generation that directly operates on the vectorized scene elements and an autoregressive Transformer for data-driven agent behaviour simulation. Scenario Dreamer additionally supports scene extrapolation via diffusion inpainting, enabling the generation of unbounded simulation environments. Extensive experiments show that Scenario Dreamer outperforms existing generative simulators in realism and efficiency: the vectorized scene-generation base model achieves superior generation quality with around 2x fewer parameters, 6x lower generation latency, and 10x fewer GPU training hours compared to the strongest baseline. We confirm its practical utility by showing that reinforcement learning planning agents are more challenged in Scenario Dreamer environments than traditional non-generative simulation environments, especially on long and adversarial driving environments.

Paper Structure

This paper contains 11 sections, 2 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Scenario Dreamer vectorized latent diffusion model for initial scene generation.Left: We embed each vectorized scene element into a latent representation with an autoencoder parameterized with factorized attention blocks, which additionally fuses the lane connectivity $\mathbf{A}$. The latent Transformer diffusion model $\epsilon_{\theta}$ is trained to sample from the autoencoder's latent distribution. Right: Scenario Dreamer samples novel driving scenes by initializing $N_o + N_l$ noise tokens which are iteratively denoised with $\epsilon_{\theta}$ over $T$ steps and decoded into vectorized scene elements. The ego vehicle is denoted in red, with other agents colored in blue and pedestrians in purple.
  • Figure 2: Vectorized environments generated by Scenario Dreamer with the proposed vectorized latent diffusion model trained on the Waymo dataset (top row) and nuPlan dataset (bottom row).
  • Figure 3: Illustrative example of Scenario Dreamer's inpainting capabilities, where the initial tile is outlined in solid lines and the inpainted tile in dashed. The model generates consistent lane geometries at the scene boundaries, even at complex intersections.