Driving scenario generation and evaluation using a structured layer representation and foundational models
Arthur Hubert, Gamal Elghazaly, Raphaël Frank
TL;DR
The paper tackles the problem of generating rare driving scenarios for autonomous vehicles by introducing a structured five-layer driving scenario model (5LM) that enables explainable generation and targeted evaluation. It combines large language models and world foundation models to extract real reference scenes from nuScenes, then edits targeted layers to produce edge cases, using unstructured and structured prompting with JSON templates. The authors propose original and diverse text-based metrics computed per layer from semantic embeddings, along with layer-wise structural metrics CD and CO, enabling nuanced comparisons to real references. Experimental results on nuScenes-mini show that structured prompting, particularly in a soft configuration, improves diversity while maintaining originality, with qualitative video generation illustrating both gains and coherence challenges. Overall, the framework offers a scalable, interpretable approach for synthetic driving data augmentation and evaluation, with opportunities for extending temporal fidelity and improving generative consistency.
Abstract
Rare and challenging driving scenarios are critical for autonomous vehicle development. Since they are difficult to encounter, simulating or generating them using generative models is a popular approach. Following previous efforts to structure driving scenario representations in a layer model, we propose a structured five-layer model to improve the evaluation and generation of rare scenarios. We use this model alongside large foundational models to generate new driving scenarios using a data augmentation strategy. Unlike previous representations, our structure introduces subclasses and characteristics for every agent of the scenario, allowing us to compare them using an embedding specific to our layer-model. We study and adapt two metrics to evaluate the relevance of a synthetic dataset in the context of a structured representation: the diversity score estimates how different the scenarios of a dataset are from one another, while the originality score calculates how similar a synthetic dataset is from a real reference set. This paper showcases both metrics in different generation setup, as well as a qualitative evaluation of synthetic videos generated from structured scenario descriptions. The code and extended results can be found at https://github.com/Valgiz/5LMSG.
