Bridging the Data Gap in AI Reliability Research and Establishing DR-AIR, a Comprehensive Data Repository for AI Reliability
Simin Zheng, Jared M. Clark, Fatemeh Salboukh, Priscila Silva, Karen da Mata, Fenglian Pan, Jie Min, Jiayi Lian, Caleb B. King, Lance Fiondella, Jian Liu, Xinwei Deng, Yili Hong
TL;DR
This paper addresses the critical data gap in AI reliability research by defining key reliability metrics and data types, surveying existing datasets, and introducing DR-AIR, a public repository for AI reliability data. It outlines data-collection designs, including laboratory, field, virtual, and physical testing, and advocates DoE and ALT to accelerate data gathering. The work catalogs multiple datasets at incidence, algorithm, module, and system levels, with detailed data dictionaries and illustrative analyses that demonstrate modeling approaches such as NHPP and Weibull-based baseline intensities. By providing DR-AIR and call-to-action for community contributions, the paper aims to standardize AI reliability data, improve cross-domain collaboration, and advance reliability-informed AI deployments across industries.
Abstract
Artificial intelligence (AI) technology and systems have been advancing rapidly. However, ensuring the reliability of these systems is crucial for fostering public confidence in their use. This necessitates the modeling and analysis of reliability data specific to AI systems. A major challenge in AI reliability research, particularly for those in academia, is the lack of readily available AI reliability data. To address this gap, this paper focuses on conducting a comprehensive review of available AI reliability data and establishing DR-AIR: a data repository for AI reliability. Specifically, we introduce key measurements and data types for assessing AI reliability, along with the methodologies used to collect these data. We also provide a detailed description of the currently available datasets with illustrative examples. Furthermore, we outline the setup of the DR-AIR repository and demonstrate its practical applications. This repository provides easy access to datasets specifically curated for AI reliability research. We believe these efforts will significantly benefit the AI research community by facilitating access to valuable reliability data and promoting collaboration across various academic domains within AI. We conclude our paper with a call to action, encouraging the research community to contribute and share AI reliability data to further advance this critical field of study.
