Table of Contents
Fetching ...

An Investigation of Experiences Engaging the Margins in Data-Centric Innovation

Gabriella Thompson, Ebtesam Al Haque, Paulette Blanc, Meme Styles, Denae Ford, Angela D. R. Smith, Brittany Johnson

TL;DR

Data representation gaps limit equitable data-centric innovation, and the paper investigates this by surveying 261 technologists on dataset-seeking experiences. The authors analyze how factors such as cost, diversity, trust, and data quantity influence dataset decisions, and how age and POC identity relate to these factors and to barriers in obtaining diverse, trustworthy data. Using chi-square, Fisher's exact, and nonparametric tests in Python/Pandas, they find significant associations—e.g., under-35s valuing diversity more and older participants emphasizing trust—alongside greater trust-related difficulty for POC. The work highlights systemic inequities in data access and outlines directions for more inclusive data practices and future research.

Abstract

Data-centric technologies provide exciting opportunities, but recent research has shown how lack of representation in datasets, often as a result of systemic inequities and socioeconomic disparities, can produce inequitable outcomes that can exclude or harm certain demographics. In this paper, we discuss preliminary insights from an ongoing effort aimed at better understanding barriers to equitable data-centric innovation. We report findings from a survey of 261 technologists and researchers who use data in their work regarding their experiences seeking adequate, representative datasets. Our findings suggest that age and identity play a significant role in the seeking and selection of representative datasets, warranting further investigation into these aspects of data-centric research and development.

An Investigation of Experiences Engaging the Margins in Data-Centric Innovation

TL;DR

Data representation gaps limit equitable data-centric innovation, and the paper investigates this by surveying 261 technologists on dataset-seeking experiences. The authors analyze how factors such as cost, diversity, trust, and data quantity influence dataset decisions, and how age and POC identity relate to these factors and to barriers in obtaining diverse, trustworthy data. Using chi-square, Fisher's exact, and nonparametric tests in Python/Pandas, they find significant associations—e.g., under-35s valuing diversity more and older participants emphasizing trust—alongside greater trust-related difficulty for POC. The work highlights systemic inequities in data access and outlines directions for more inclusive data practices and future research.

Abstract

Data-centric technologies provide exciting opportunities, but recent research has shown how lack of representation in datasets, often as a result of systemic inequities and socioeconomic disparities, can produce inequitable outcomes that can exclude or harm certain demographics. In this paper, we discuss preliminary insights from an ongoing effort aimed at better understanding barriers to equitable data-centric innovation. We report findings from a survey of 261 technologists and researchers who use data in their work regarding their experiences seeking adequate, representative datasets. Our findings suggest that age and identity play a significant role in the seeking and selection of representative datasets, warranting further investigation into these aspects of data-centric research and development.
Paper Structure (15 sections)