Scenario Extraction from a Large Real-World Dataset for the Assessment of Automated Vehicles

Detian Guo; Manuel Muñoz Sánchez; Erwin de Gelder; Tom P. J. van der Sande

Scenario Extraction from a Large Real-World Dataset for the Assessment of Automated Vehicles

Detian Guo, Manuel Muñoz Sánchez, Erwin de Gelder, Tom P. J. van der Sande

TL;DR

This work addresses the problem of extracting safety-critical driving scenarios from large real-world datasets for automated vehicle validation. It introduces a three-step pipeline—data preprocessing, tagging across actor activities, actor-environment interactions, and inter-actor interactions, and scenario categorization by matching tag combinations. The approach is validated in CARLA simulations and applied to the Waymo Open Motion Dataset, yielding 215,090 scenarios across multiple categories and accompanied by an open-source codebase. The results demonstrate the feasibility of scalable, scenario-based AV assessment and provide a practical foundation for building a real-world scenario database aligned with operational design domains.

Abstract

Many players in the automotive field support scenario-based assessment of automated vehicles (AVs), where individual traffic situations can be tested and, thus, facilitate concluding on the performance of AVs in different situations. Since a large number of different scenarios can occur in real-world traffic, the question is how to find a finite set of relevant scenarios. Scenarios extracted from large real-world datasets represent real-world traffic since real driving data is used. Extracting scenarios, however, is challenging because (1) the scenarios to be tested should assess the AVs behave safely, which conflicts with the fact that the majority of the data contains scenarios that are not interesting from a safety perspective, and (2) extensive data processing is required, which hinders the utilization of large real-world datasets. In this work, we propose an approach for extracting scenarios from real-world driving data. The first step is data preprocessing to tackle the errors and noise in real-world data by reconstructing the data. The second step performs data tagging to label actors' activities, their interactions with each other and the environment. Finally, the scenarios are extracted by searching for combinations of tags. The proposed approach is evaluated using data simulated with CARLA and applied to a part of a large real-world driving dataset, i.e., the Waymo Open Motion Dataset (WOMD). The code and scenarios extracted from WOMD are open to the research community to facilitate the assessment of the automated driving functions in different scenarios.

Scenario Extraction from a Large Real-World Dataset for the Assessment of Automated Vehicles

TL;DR

Abstract

Paper Structure (11 sections, 7 equations, 5 figures, 4 tables)

This paper contains 11 sections, 7 equations, 5 figures, 4 tables.

Introduction
Related work
Methodology
Data preprocessing
Tagging
Actor activity
Actor-environment interaction
Interaction between different actors
Scenario categorization using tags
Results
Conclusion and future work

Figures (5)

Figure 1: Overview of the traffic where different RU interact with each other and the environment at a crossroads. Vehicles: V1-V6, pedestrians: P1-P2, cyclists: C1-C2.
Figure 2: The three-step approach for scenario extraction from real-world data. The colors green and blue refer to actor-related and environment-related processes and tags, respectively.
Figure 3: The expanded bounding boxes $\mathcal{B}_{\mathrm{e}}^V$ and $\mathcal{B}_{\mathrm{e}}^C$ for tagging the interactive actors in close proximity. $C$: cyclist, $V$: vehicle.
Figure 4: The predicted bounding boxes $\mathcal{B}_{\mathrm{p}}^P$ and $\mathcal{B}_{\mathrm{p}}^V$ for tagging the interactive actors with estimated collision. $P$: pedestrian, $V$: vehicle, $\mathcal{B}_{\mathrm{p}}(k_{\mathrm{p}})$: $\mathcal{B}_{\mathrm{p}}$ at the time step $k_{\mathrm{p}}$.
Figure 5: Examples for the extracted scenarios. The origin of the coordinate system is an arbitrary point.

Scenario Extraction from a Large Real-World Dataset for the Assessment of Automated Vehicles

TL;DR

Abstract

Scenario Extraction from a Large Real-World Dataset for the Assessment of Automated Vehicles

Authors

TL;DR

Abstract

Table of Contents

Figures (5)