Table of Contents
Fetching ...

Driving scenario generation and evaluation using a structured layer representation and foundational models

Arthur Hubert, Gamal Elghazaly, Raphaël Frank

TL;DR

The paper tackles the problem of generating rare driving scenarios for autonomous vehicles by introducing a structured five-layer driving scenario model (5LM) that enables explainable generation and targeted evaluation. It combines large language models and world foundation models to extract real reference scenes from nuScenes, then edits targeted layers to produce edge cases, using unstructured and structured prompting with JSON templates. The authors propose original and diverse text-based metrics computed per layer from semantic embeddings, along with layer-wise structural metrics CD and CO, enabling nuanced comparisons to real references. Experimental results on nuScenes-mini show that structured prompting, particularly in a soft configuration, improves diversity while maintaining originality, with qualitative video generation illustrating both gains and coherence challenges. Overall, the framework offers a scalable, interpretable approach for synthetic driving data augmentation and evaluation, with opportunities for extending temporal fidelity and improving generative consistency.

Abstract

Rare and challenging driving scenarios are critical for autonomous vehicle development. Since they are difficult to encounter, simulating or generating them using generative models is a popular approach. Following previous efforts to structure driving scenario representations in a layer model, we propose a structured five-layer model to improve the evaluation and generation of rare scenarios. We use this model alongside large foundational models to generate new driving scenarios using a data augmentation strategy. Unlike previous representations, our structure introduces subclasses and characteristics for every agent of the scenario, allowing us to compare them using an embedding specific to our layer-model. We study and adapt two metrics to evaluate the relevance of a synthetic dataset in the context of a structured representation: the diversity score estimates how different the scenarios of a dataset are from one another, while the originality score calculates how similar a synthetic dataset is from a real reference set. This paper showcases both metrics in different generation setup, as well as a qualitative evaluation of synthetic videos generated from structured scenario descriptions. The code and extended results can be found at https://github.com/Valgiz/5LMSG.

Driving scenario generation and evaluation using a structured layer representation and foundational models

TL;DR

The paper tackles the problem of generating rare driving scenarios for autonomous vehicles by introducing a structured five-layer driving scenario model (5LM) that enables explainable generation and targeted evaluation. It combines large language models and world foundation models to extract real reference scenes from nuScenes, then edits targeted layers to produce edge cases, using unstructured and structured prompting with JSON templates. The authors propose original and diverse text-based metrics computed per layer from semantic embeddings, along with layer-wise structural metrics CD and CO, enabling nuanced comparisons to real references. Experimental results on nuScenes-mini show that structured prompting, particularly in a soft configuration, improves diversity while maintaining originality, with qualitative video generation illustrating both gains and coherence challenges. Overall, the framework offers a scalable, interpretable approach for synthetic driving data augmentation and evaluation, with opportunities for extending temporal fidelity and improving generative consistency.

Abstract

Rare and challenging driving scenarios are critical for autonomous vehicle development. Since they are difficult to encounter, simulating or generating them using generative models is a popular approach. Following previous efforts to structure driving scenario representations in a layer model, we propose a structured five-layer model to improve the evaluation and generation of rare scenarios. We use this model alongside large foundational models to generate new driving scenarios using a data augmentation strategy. Unlike previous representations, our structure introduces subclasses and characteristics for every agent of the scenario, allowing us to compare them using an embedding specific to our layer-model. We study and adapt two metrics to evaluate the relevance of a synthetic dataset in the context of a structured representation: the diversity score estimates how different the scenarios of a dataset are from one another, while the originality score calculates how similar a synthetic dataset is from a real reference set. This paper showcases both metrics in different generation setup, as well as a qualitative evaluation of synthetic videos generated from structured scenario descriptions. The code and extended results can be found at https://github.com/Valgiz/5LMSG.

Paper Structure

This paper contains 23 sections, 7 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Proposed method for generating and evaluating new diverse scenario from real scenes. Our generation strategy relies on editing real scenarios after representing in a 5-layer model.
  • Figure 2: Snapshot of the structured 5-layer model representation. The Enum fields correspond to a list of available choice for the model, while the str fields are more free section for the model to specify additional information. Some fields like structures (L2), dynamic objects (L4) or illumination (L5) expect a list of all relevant object fitting that category within the scene.
  • Figure 3: Application of the layer model to scenario generation. The LLM is prompted with editing specific layers at a time to increase our control over the generation process. Similarly, the evaluation is also done layer per layer for increased scene understanding
  • Figure 4: Diversity and Originality are calculated using the semantic distances between the embeddings of the generated scenarios $S_o$ and the reference scenarios $S_b$.
  • Figure 5: Example of an edited layer 4 from scene 1 of nuScenes in the structured 5LM. The truck and mattress have been added to the scene.
  • ...and 2 more figures