Table of Contents
Fetching ...

Measuring Perceptions of Fairness in AI Systems: The Effects of Infra-marginality

Schrasing Tong, Minseok Jung, Ilaria Liccardi, Lalana Kagal

TL;DR

It is argued that accounting for distributional context is critical to aligning algorithmic fairness metrics with human expectations in real-world applications and that accounting for distributional context is critical to aligning algorithmic fairness metrics with human expectations in real-world applications.

Abstract

Differences in data distributions between demographic groups, known as the problem of infra-marginality, complicate how people evaluate fairness in machine learning models. We present a user study with 85 participants in a hypothetical medical decision-making scenario to examine two treatments: group-specific model performance and training data availability. Our results show that participants did not equate fairness with simple statistical parity. When group-specific performances were equal or unavailable, participants preferred models that produced equal outcomes; when performances differed, especially in ways consistent with data imbalances, they judged models that preserved those differences as more fair. These findings highlight that fairness judgments are shaped not only by outcomes, but also by beliefs about the causes of disparities. We discuss implications for popular group fairness definitions and system design, arguing that accounting for distributional context is critical to aligning algorithmic fairness metrics with human expectations in real-world applications.

Measuring Perceptions of Fairness in AI Systems: The Effects of Infra-marginality

TL;DR

It is argued that accounting for distributional context is critical to aligning algorithmic fairness metrics with human expectations in real-world applications and that accounting for distributional context is critical to aligning algorithmic fairness metrics with human expectations in real-world applications.

Abstract

Differences in data distributions between demographic groups, known as the problem of infra-marginality, complicate how people evaluate fairness in machine learning models. We present a user study with 85 participants in a hypothetical medical decision-making scenario to examine two treatments: group-specific model performance and training data availability. Our results show that participants did not equate fairness with simple statistical parity. When group-specific performances were equal or unavailable, participants preferred models that produced equal outcomes; when performances differed, especially in ways consistent with data imbalances, they judged models that preserved those differences as more fair. These findings highlight that fairness judgments are shaped not only by outcomes, but also by beliefs about the causes of disparities. We discuss implications for popular group fairness definitions and system design, arguing that accounting for distributional context is critical to aligning algorithmic fairness metrics with human expectations in real-world applications.
Paper Structure (19 sections, 3 figures)

This paper contains 19 sections, 3 figures.

Figures (3)

  • Figure 1: Mean and standard errors of fairness perceptions on the 3 Options for the group-specific performance treatment. Group-specific accuracy denoted as (Race A and Race B) for the 7 subplots are NA/NA, 90/90, 70/70, 95/85, 75/65, 85/95, and 65/75. * signifies p $<$ 0.05, ** signifies p $<$ 0.01, and *** signifies p $<$ 0.001.
  • Figure 2: Mean and standard errors of fairness perceptions on the 3 Options when Race A $>$ Race B in group-specific performance. Subplots show data of Race A relative to Race B: no info, 3x, 20x, and 1x respectively. * signifies p $<$ 0.05, ** signifies p $<$ 0.01, and *** signifies p $<$ 0.001.
  • Figure 3: Mean and standard errors of fairness perceptions on the 3 Options when Race A $<$ Race B in group-specific performance. Subplots show data of Race A relative to Race B: no info, 3x, 20x, and 1x respectively. * signifies p $<$ 0.05, ** signifies p $<$ 0.01, and *** signifies p $<$ 0.001.