Table of Contents
Fetching ...

AI data transparency: an exploration through the lens of AI incidents

Sophia Worth, Ben Snaith, Arunav Das, Gefion Thuermer, Elena Simperl

TL;DR

It is demonstrated that low data transparency persists across a wide range of systems, and further that issues of transparency and explainability at model- and system- level create barriers for investigating data transparency information to address public concerns about AI systems.

Abstract

Knowing more about the data used to build AI systems is critical for allowing different stakeholders to play their part in ensuring responsible and appropriate deployment and use. Meanwhile, a 2023 report shows that data transparency lags significantly behind other areas of AI transparency in popular foundation models. In this research, we sought to build on these findings, exploring the status of public documentation about data practices within AI systems generating public concern. Our findings demonstrate that low data transparency persists across a wide range of systems, and further that issues of transparency and explainability at model- and system- level create barriers for investigating data transparency information to address public concerns about AI systems. We highlight a need to develop systematic ways of monitoring AI data transparency that account for the diversity of AI system types, and for such efforts to build on further understanding of the needs of those both supplying and using data transparency information.

AI data transparency: an exploration through the lens of AI incidents

TL;DR

It is demonstrated that low data transparency persists across a wide range of systems, and further that issues of transparency and explainability at model- and system- level create barriers for investigating data transparency information to address public concerns about AI systems.

Abstract

Knowing more about the data used to build AI systems is critical for allowing different stakeholders to play their part in ensuring responsible and appropriate deployment and use. Meanwhile, a 2023 report shows that data transparency lags significantly behind other areas of AI transparency in popular foundation models. In this research, we sought to build on these findings, exploring the status of public documentation about data practices within AI systems generating public concern. Our findings demonstrate that low data transparency persists across a wide range of systems, and further that issues of transparency and explainability at model- and system- level create barriers for investigating data transparency information to address public concerns about AI systems. We highlight a need to develop systematic ways of monitoring AI data transparency that account for the diversity of AI system types, and for such efforts to build on further understanding of the needs of those both supplying and using data transparency information.
Paper Structure (26 sections, 4 figures, 2 tables)

This paper contains 26 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Figure from Bommanasi et al. (2023), in their Foundation Model Transparency Index paper comparing transparency across 10 key foundation models and 10 aspects of AI ecosystem transparency. Their ‘data layer’ includes data, labour and compute factors.
  • Figure 2: Overview of methodology
  • Figure 3: AI models scoring a point for each data indicator in this research (n=25) in comparison to the findings of the Foundation Model Transparency Index (n=10) (bommasani_foundation_2023
  • Figure 4: Comparing number of data transparency indicators across all AI models analysed