Measuring Perceptions of Fairness in AI Systems: The Effects of Infra-marginality

Schrasing Tong; Minseok Jung; Ilaria Liccardi; Lalana Kagal

Measuring Perceptions of Fairness in AI Systems: The Effects of Infra-marginality

Schrasing Tong, Minseok Jung, Ilaria Liccardi, Lalana Kagal

TL;DR

It is argued that accounting for distributional context is critical to aligning algorithmic fairness metrics with human expectations in real-world applications and that accounting for distributional context is critical to aligning algorithmic fairness metrics with human expectations in real-world applications.

Abstract

Differences in data distributions between demographic groups, known as the problem of infra-marginality, complicate how people evaluate fairness in machine learning models. We present a user study with 85 participants in a hypothetical medical decision-making scenario to examine two treatments: group-specific model performance and training data availability. Our results show that participants did not equate fairness with simple statistical parity. When group-specific performances were equal or unavailable, participants preferred models that produced equal outcomes; when performances differed, especially in ways consistent with data imbalances, they judged models that preserved those differences as more fair. These findings highlight that fairness judgments are shaped not only by outcomes, but also by beliefs about the causes of disparities. We discuss implications for popular group fairness definitions and system design, arguing that accounting for distributional context is critical to aligning algorithmic fairness metrics with human expectations in real-world applications.

Measuring Perceptions of Fairness in AI Systems: The Effects of Infra-marginality

TL;DR

Abstract

Paper Structure (19 sections, 3 figures)

This paper contains 19 sections, 3 figures.

Introduction
Related Research
User Study
Study Design
Procedure
Operationalization
The Scenario
Treatment on group-specific performance
Treatment on data availability
Experiment controls
Models presented in the Options
Participants
Data Validity and Pre-processing
Results
Participant Demographics
...and 4 more sections

Figures (3)

Figure 1: Mean and standard errors of fairness perceptions on the 3 Options for the group-specific performance treatment. Group-specific accuracy denoted as (Race A and Race B) for the 7 subplots are NA/NA, 90/90, 70/70, 95/85, 75/65, 85/95, and 65/75. * signifies p $<$ 0.05, ** signifies p $<$ 0.01, and *** signifies p $<$ 0.001.
Figure 2: Mean and standard errors of fairness perceptions on the 3 Options when Race A $>$ Race B in group-specific performance. Subplots show data of Race A relative to Race B: no info, 3x, 20x, and 1x respectively. * signifies p $<$ 0.05, ** signifies p $<$ 0.01, and *** signifies p $<$ 0.001.
Figure 3: Mean and standard errors of fairness perceptions on the 3 Options when Race A $<$ Race B in group-specific performance. Subplots show data of Race A relative to Race B: no info, 3x, 20x, and 1x respectively. * signifies p $<$ 0.05, ** signifies p $<$ 0.01, and *** signifies p $<$ 0.001.

Measuring Perceptions of Fairness in AI Systems: The Effects of Infra-marginality

TL;DR

Abstract

Measuring Perceptions of Fairness in AI Systems: The Effects of Infra-marginality

Authors

TL;DR

Abstract

Table of Contents

Figures (3)