Table of Contents
Fetching ...

Fairness Testing: A Comprehensive Survey and Analysis of Trends

Zhenpeng Chen, Jie M. Zhang, Max Hort, Mark Harman, Federica Sarro

TL;DR

This survey provides a comprehensive overview of fairness testing for ML software by collecting 100 papers and organizing them around two axes: the testing workflow (how to test) and the testing components (what to test). It formalizes fairness bug and fairness testing concepts, surveys definitions of individual and group fairness, and details test-input generation and test-oracle approaches. The study analyzes research trends across venues, data types, and tasks, and catalogs public datasets and tools, highlighting opportunities in multi-attribute fairness, test adequacy, and cross-domain fairness testing, including DL and domain-specific applications. The findings offer a practical roadmap for researchers and practitioners to advance fair ML systems, with emphasis on scalable testing methods, reliable oracles, and broader stakeholder engagement.

Abstract

Unfair behaviors of Machine Learning (ML) software have garnered increasing attention and concern among software engineers. To tackle this issue, extensive research has been dedicated to conducting fairness testing of ML software, and this paper offers a comprehensive survey of existing studies in this field. We collect 100 papers and organize them based on the testing workflow (i.e., how to test) and testing components (i.e., what to test). Furthermore, we analyze the research focus, trends, and promising directions in the realm of fairness testing. We also identify widely-adopted datasets and open-source tools for fairness testing.

Fairness Testing: A Comprehensive Survey and Analysis of Trends

TL;DR

This survey provides a comprehensive overview of fairness testing for ML software by collecting 100 papers and organizing them around two axes: the testing workflow (how to test) and the testing components (what to test). It formalizes fairness bug and fairness testing concepts, surveys definitions of individual and group fairness, and details test-input generation and test-oracle approaches. The study analyzes research trends across venues, data types, and tasks, and catalogs public datasets and tools, highlighting opportunities in multi-attribute fairness, test adequacy, and cross-domain fairness testing, including DL and domain-specific applications. The findings offer a practical roadmap for researchers and practitioners to advance fair ML systems, with emphasis on scalable testing methods, reliable oracles, and broader stakeholder engagement.

Abstract

Unfair behaviors of Machine Learning (ML) software have garnered increasing attention and concern among software engineers. To tackle this issue, extensive research has been dedicated to conducting fairness testing of ML software, and this paper offers a comprehensive survey of existing studies in this field. We collect 100 papers and organize them based on the testing workflow (i.e., how to test) and testing components (i.e., what to test). Furthermore, we analyze the research focus, trends, and promising directions in the realm of fairness testing. We also identify widely-adopted datasets and open-source tools for fairness testing.
Paper Structure (61 sections, 10 figures, 9 tables)

This paper contains 61 sections, 10 figures, 9 tables.

Figures (10)

  • Figure 1: Cumulative number of publications on fairness testing.
  • Figure 2: Structure of this paper.
  • Figure 3: Workflow of fairness testing.
  • Figure 4: Components to test in ML software.
  • Figure 5: Testing components of fairness testing.
  • ...and 5 more figures