Table of Contents
Fetching ...

A Taxonomy of Challenges to Curating Fair Datasets

Dora Zhao, Morgan Klaus Scheuerman, Pooja Chitre, Jerone T. A. Andrews, Georgia Panagiotidou, Shawn Walker, Kathleen H. Pine, Alice Xiang

TL;DR

A comprehensive taxonomy of the challenges and trade-offs encountered throughout the dataset curation lifecycle is presented, underscore overarching issues within the broader fairness landscape that impact data curation.

Abstract

Despite extensive efforts to create fairer machine learning (ML) datasets, there remains a limited understanding of the practical aspects of dataset curation. Drawing from interviews with 30 ML dataset curators, we present a comprehensive taxonomy of the challenges and trade-offs encountered throughout the dataset curation lifecycle. Our findings underscore overarching issues within the broader fairness landscape that impact data curation. We conclude with recommendations aimed at fostering systemic changes to better facilitate fair dataset curation practices.

A Taxonomy of Challenges to Curating Fair Datasets

TL;DR

A comprehensive taxonomy of the challenges and trade-offs encountered throughout the dataset curation lifecycle is presented, underscore overarching issues within the broader fairness landscape that impact data curation.

Abstract

Despite extensive efforts to create fairer machine learning (ML) datasets, there remains a limited understanding of the practical aspects of dataset curation. Drawing from interviews with 30 ML dataset curators, we present a comprehensive taxonomy of the challenges and trade-offs encountered throughout the dataset curation lifecycle. Our findings underscore overarching issues within the broader fairness landscape that impact data curation. We conclude with recommendations aimed at fostering systemic changes to better facilitate fair dataset curation practices.
Paper Structure (43 sections, 2 figures, 4 tables)

This paper contains 43 sections, 2 figures, 4 tables.

Figures (2)

  • Figure 1: A circular process diagram showing how each challenge we identified maps to each phase and subphase of the dataset lifecycle.
  • Figure 2: A social ecological bronfenbrenner1994ecological representation of challenges in each layer in the overarching landscape of fairness. A social ecological model shows how each layer is nested but interconnected.