Table of Contents
Fetching ...

Dataset Safety in Autonomous Driving: Requirements, Risks, and Assurance

Alireza Abbaspour, Tejaskumar Balgonda Patil, B Ravi Kiran, Russel Mohr, Senthil Yogamani

TL;DR

This paper tackles the critical issue of dataset safety in autonomous driving by proposing an ISO/PAS 8800-aligned framework that integrates a data-centric Data Flywheel within a structured dataset safety lifecycle. It details end-to-end data management—from collection and annotation to curation and maintenance—coupled with formal safety analyses (HAZOP, FTA, FMEA, STPA) and verification/validation strategies to ensure safety requirements are met across the ODD. Key contributions include the explicit mapping of AI safety requirements to dataset specifications, the development of a lifecycle model with clear blocks and processes, and practical guidance on design, implementation, and maintenance of safe, auditable datasets. The work emphasizes the need for traceability, distribution-shift monitoring, data leakage prevention, and scalable, automated tooling, providing a foundation for safer AI systems in autonomous driving and outlining future directions such as rapid data construction and robust security for datasets.

Abstract

Dataset integrity is fundamental to the safety and reliability of AI systems, especially in autonomous driving. This paper presents a structured framework for developing safe datasets aligned with ISO/PAS 8800 guidelines. Using AI-based perception systems as the primary use case, it introduces the AI Data Flywheel and the dataset lifecycle, covering data collection, annotation, curation, and maintenance. The framework incorporates rigorous safety analyses to identify hazards and mitigate risks caused by dataset insufficiencies. It also defines processes for establishing dataset safety requirements and proposes verification and validation strategies to ensure compliance with safety standards. In addition to outlining best practices, the paper reviews recent research and emerging trends in dataset safety and autonomous vehicle development, providing insights into current challenges and future directions. By integrating these perspectives, the paper aims to advance robust, safety-assured AI systems for autonomous driving applications.

Dataset Safety in Autonomous Driving: Requirements, Risks, and Assurance

TL;DR

This paper tackles the critical issue of dataset safety in autonomous driving by proposing an ISO/PAS 8800-aligned framework that integrates a data-centric Data Flywheel within a structured dataset safety lifecycle. It details end-to-end data management—from collection and annotation to curation and maintenance—coupled with formal safety analyses (HAZOP, FTA, FMEA, STPA) and verification/validation strategies to ensure safety requirements are met across the ODD. Key contributions include the explicit mapping of AI safety requirements to dataset specifications, the development of a lifecycle model with clear blocks and processes, and practical guidance on design, implementation, and maintenance of safe, auditable datasets. The work emphasizes the need for traceability, distribution-shift monitoring, data leakage prevention, and scalable, automated tooling, providing a foundation for safer AI systems in autonomous driving and outlining future directions such as rapid data construction and robust security for datasets.

Abstract

Dataset integrity is fundamental to the safety and reliability of AI systems, especially in autonomous driving. This paper presents a structured framework for developing safe datasets aligned with ISO/PAS 8800 guidelines. Using AI-based perception systems as the primary use case, it introduces the AI Data Flywheel and the dataset lifecycle, covering data collection, annotation, curation, and maintenance. The framework incorporates rigorous safety analyses to identify hazards and mitigate risks caused by dataset insufficiencies. It also defines processes for establishing dataset safety requirements and proposes verification and validation strategies to ensure compliance with safety standards. In addition to outlining best practices, the paper reviews recent research and emerging trends in dataset safety and autonomous vehicle development, providing insights into current challenges and future directions. By integrating these perspectives, the paper aims to advance robust, safety-assured AI systems for autonomous driving applications.

Paper Structure

This paper contains 48 sections, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Components of a typical Autonomous Driving Pipeline.
  • Figure 2: Data flywheel from collection, Data quality and diversification, model training, automated labeling model based annotation (camera+LiDAR DNN), to automated annotation quality check.
  • Figure 3: Automated data or file selection pipeline with various configurations to retrieve files that satisfy requirements, metadata attributes and filtering via OpenStreetMap and multimodal image-text embeddings.
  • Figure 4: Automated annotation quality check model is a semantic segmentation pipeline based on SAM and OpenClip.
  • Figure 5: Dataset Lifecycle recommended by ISO/PAS 8800 ISO8800