SLEDGE: Synthesizing Driving Environments with Generative Models and Rule-Based Traffic

Kashyap Chitta; Daniel Dauner; Andreas Geiger

SLEDGE: Synthesizing Driving Environments with Generative Models and Rule-Based Traffic

Kashyap Chitta, Daniel Dauner, Andreas Geiger

TL;DR

The paper tackles the challenge of generating realistic, long-horizon driving scenes for planning evaluation by moving beyond log replay to a diffusion-based generative framework. It introduces a Rasterized Latent Map (RLM) learned via a Raster-to-Vector Autoencoder (RVAE) and a Diffusion Transformer (DiT) to jointly produce lane graphs and agent states, with city-conditioned generation and inpainting capabilities for route extrapolation. Key contributions include formalizing abstract scene generation with a new benchmark and metrics, showing that RLM-based representations with channel masking approach ground-truth vector quality, and demonstrating a storage-efficient simulator, SLEDGE, capable of long-horizon, density-controlled planning tests with up to 500 m routes. The work enables scalable, controllable data-driven testing of motion planning algorithms, potentially democratizing access to rigorous evaluation and driving advances in autonomous driving research.

Abstract

SLEDGE is the first generative simulator for vehicle motion planning trained on real-world driving logs. Its core component is a learned model that is able to generate agent bounding boxes and lane graphs. The model's outputs serve as an initial state for rule-based traffic simulation. The unique properties of the entities to be generated for SLEDGE, such as their connectivity and variable count per scene, render the naive application of most modern generative models to this task non-trivial. Therefore, together with a systematic study of existing lane graph representations, we introduce a novel raster-to-vector autoencoder. It encodes agents and the lane graph into distinct channels in a rasterized latent map. This facilitates both lane-conditioned agent generation and combined generation of lanes and agents with a Diffusion Transformer. Using generated entities in SLEDGE enables greater control over the simulation, e.g. upsampling turns or increasing traffic density. Further, SLEDGE can support 500m long routes, a capability not found in existing data-driven simulators like nuPlan. It presents new challenges for planning algorithms, evidenced by failure rates of over 40% for PDM, the winner of the 2023 nuPlan challenge, when tested on hard routes and dense traffic generated by our model. Compared to nuPlan, SLEDGE requires 500$\times$ less storage to set up (<4 GB), making it a more accessible option and helping with democratizing future research in this field.

SLEDGE: Synthesizing Driving Environments with Generative Models and Rule-Based Traffic

TL;DR

Abstract

less storage to set up (<4 GB), making it a more accessible option and helping with democratizing future research in this field.

Paper Structure (12 sections, 6 figures, 3 tables)

This paper contains 12 sections, 6 figures, 3 tables.

Introduction
Related Work
Method
nuPlan Vector Representation
Raster-to-Vector Autoencoder
Diffusion Transformer
SLEDGE Simulation Environments
Experiments
Lane Graph Representations
Lane Graph Generation
SLEDGE Simulation of PDM-Closed
Conclusion

Figures (6)

Figure 1: SLEDGE. We show state snapshots of simulation environments generated by our approach in 4 cities, with the lanes, ego-vehicle, other vehicles, pedestrians, obstacles, and trafficlights. Our supplementary video visualizes clips with more examples.
Figure 2: Rasterized State Image (RSI). We encode $\mathcal{S}$ into a 12-channel image, with 2 channels per entity type. We visualize these encodings as optical flow fields.
Figure 3: Raster-to-Vector Autoencoder (RVAE). We represent scenes with a rasterized latent map (RLM) consisting of two channel groups. The 'Lanes' group is decoded into lane segments and the 'Agents' group into all other scene entities, via a transformer decoder with attention masking. The autoencoder is trained to predict polylines, bounding boxes, and the ego velocity in a simulation-compatible format.
Figure 4: Route Extrapolation by Inpainting. We show an example scenario generated by our DiT, where we iteratively sample poses along a route, warp the previous tile's RSI to this pose, and generate a new tile conditioned on the warped RSI.
Figure 5: Long Route Simulation. SLEDGE supports (a) replayed scenarios, (b) lane-conditioned agent generation, and (c) joint lane and agent generation. Importantly, we enable testing on arbitrarily long routes by dynamically simulating agents near the ego vehicle while keeping the state of distant agents fixed.
...and 1 more figures

SLEDGE: Synthesizing Driving Environments with Generative Models and Rule-Based Traffic

TL;DR

Abstract

SLEDGE: Synthesizing Driving Environments with Generative Models and Rule-Based Traffic

Authors

TL;DR

Abstract

Table of Contents

Figures (6)