Towards Scenario- and Capability-Driven Dataset Development and Evaluation: An Approach in the Context of Mapless Automated Driving
Felix Grün, Marcus Nolte, Markus Maurer
TL;DR
This paper tackles the data bottleneck in mapless automated driving by introducing a scenario- and capability-driven dataset development framework grounded in ISO 21448 (SOTIF) and ISO/TR 4804 Annex B. It uses capability graphs to translate operational driving scenarios into concrete dataset requirements and demonstrates the approach with two driving scenarios, followed by a comprehensive review of existing lane-detection datasets. The analysis highlights gaps in labeling granularity, driving-direction information, and occlusion handling, demonstrating that no single dataset fully supports mapless driving tasks such as complex lane-changing. The proposed framework provides a structured method for deriving dataset requirements and enables meaningful cross-dataset comparisons to guide future data collection and labeling efforts, ultimately supporting safer and more capable perception systems for mapless driving.
Abstract
The foundational role of datasets in defining the capabilities of deep learning models has led to their rapid proliferation. At the same time, published research focusing on the process of dataset development for environment perception in automated driving has been scarce, thereby reducing the applicability of openly available datasets and impeding the development of effective environment perception systems. Sensor-based, mapless automated driving is one of the contexts where this limitation is evident. While leveraging real-time sensor data, instead of pre-defined HD maps promises enhanced adaptability and safety by effectively navigating unexpected environmental changes, it also increases the demands on the scope and complexity of the information provided by the perception system. To address these challenges, we propose a scenario- and capability-based approach to dataset development. Grounded in the principles of ISO 21448 (safety of the intended functionality, SOTIF), extended by ISO/TR 4804, our approach facilitates the structured derivation of dataset requirements. This not only aids in the development of meaningful new datasets but also enables the effective comparison of existing ones. Applying this methodology to a broad range of existing lane detection datasets, we identify significant limitations in current datasets, particularly in terms of real-world applicability, a lack of labeling of critical features, and an absence of comprehensive information for complex driving maneuvers.
