Scenario Extraction from a Large Real-World Dataset for the Assessment of Automated Vehicles
Detian Guo, Manuel Muñoz Sánchez, Erwin de Gelder, Tom P. J. van der Sande
TL;DR
This work addresses the problem of extracting safety-critical driving scenarios from large real-world datasets for automated vehicle validation. It introduces a three-step pipeline—data preprocessing, tagging across actor activities, actor-environment interactions, and inter-actor interactions, and scenario categorization by matching tag combinations. The approach is validated in CARLA simulations and applied to the Waymo Open Motion Dataset, yielding 215,090 scenarios across multiple categories and accompanied by an open-source codebase. The results demonstrate the feasibility of scalable, scenario-based AV assessment and provide a practical foundation for building a real-world scenario database aligned with operational design domains.
Abstract
Many players in the automotive field support scenario-based assessment of automated vehicles (AVs), where individual traffic situations can be tested and, thus, facilitate concluding on the performance of AVs in different situations. Since a large number of different scenarios can occur in real-world traffic, the question is how to find a finite set of relevant scenarios. Scenarios extracted from large real-world datasets represent real-world traffic since real driving data is used. Extracting scenarios, however, is challenging because (1) the scenarios to be tested should assess the AVs behave safely, which conflicts with the fact that the majority of the data contains scenarios that are not interesting from a safety perspective, and (2) extensive data processing is required, which hinders the utilization of large real-world datasets. In this work, we propose an approach for extracting scenarios from real-world driving data. The first step is data preprocessing to tackle the errors and noise in real-world data by reconstructing the data. The second step performs data tagging to label actors' activities, their interactions with each other and the environment. Finally, the scenarios are extracted by searching for combinations of tags. The proposed approach is evaluated using data simulated with CARLA and applied to a part of a large real-world driving dataset, i.e., the Waymo Open Motion Dataset (WOMD). The code and scenarios extracted from WOMD are open to the research community to facilitate the assessment of the automated driving functions in different scenarios.
