Integrated Scenario-based Analysis: A data-driven approach to support automated driving systems development and safety evaluation
Gibran Ali, Kaye Sullivan, Eileen Herbers, Vicki Williams, Dustin Holley, Jacobo Antona-Makoshi, Kevin Kefauver
TL;DR
The paper addresses the limited representativity of traditional scenario-based ADAS/ADS development by proposing an integrated, data-driven framework that fuses multiple US data sources across the severity spectrum. It demonstrates the approach on the illustrative scenario of turns at intersections, combining 10 years of national crash data with SHRP2 naturalistic driving data, and estimating exposure via $VMT_{est} = \sum_{CY=a}^{b}\sum_{MY=x}^{CY+1} VIO_{(CY,MY)} \times AAM_{CY-MY}$ to align denominators across datasets. The framework provides frequencies, parameter distributions, and concrete test-case generation workflows, enabling context-rich, multi-year scenarios and facilitating data-driven simulation and testing with OpenDRIVE/OpenSCENARIO representations. Overall, this integrated methodology enhances scenario coverage from routine driving to fatal crashes, supporting more robust ADAS/ADS development and safety evaluation in real-world operating conditions.
Abstract
Several scenario-based frameworks exist to aid in vehicle system development and safety assurance. However, there is a need for approaches that combine different types of datasets that offer varying levels of case severity, data richness, and representativeness. This study presents an integrated scenario-based analysis approach that encompasses scenario definition, fusion, parametrization, and test case generation. For this process, ten years of fatal and non-fatal national crash data from the United States are combined with over 34 million miles of naturalistic driving data. An illustrative example scenario, "turns at intersection", is chosen to demonstrate this approach. First, scenario definitions are established from both record-based and continuous time series data. Second, a frequency analysis is performed to understand how often events from the same scenario occur at different severities across datasets. Third, an analysis is performed to show the key factors relevant to the scenario and the distribution of various parameters. Finally, a method to combine both types of data into representative test case scenarios is presented. These techniques improve scenario representativeness in two major ways: first, they populate an entire spectrum of cases ranging from routine events to fatal crashes; and second, they provide context-rich, multi-year data by combining large-scale national and naturalistic datasets.
