Table of Contents
Fetching ...

Aleatoric and Epistemic Discrimination: Fundamental Limits of Fairness Interventions

Hao Wang, Luxi He, Rui Gao, Flavio P. Calmon

TL;DR

The paper tackles algorithmic discrimination by separating data-inherent (aleatoric) biases from model-development (epistemic) choices. It defines the fairness Pareto frontier $ ontname{phv} extit{FairFront}(oldsymbol{ extalpha}_{ ext{SP}},oldsymbol{ extalpha}_{ ext{EO}},oldsymbol{ extalpha}_{ ext{OAE}})$ as the maximum achievable accuracy under group-fairness constraints and characterizes the feasible set of conditional prediction distributions $P_{ exthat{Y}| extS, extY}$ using Blackwell's comparison theorems. A greedy algorithm with convergence guarantees provides an upper-bound approximation to the frontier, enabling benchmarking of existing fairness interventions. Empirical studies across standard tabular datasets show that state-of-the-art (SOTA) fairness methods closely approach the information-theoretic frontier for epistemic discrimination under pristine data, but disparate missing data patterns reveal substantial aleatoric discrimination and reduce intervention effectiveness. The framework offers guidance for data collection and missing-data handling to promote fair and accurate downstream decisions."

Abstract

Machine learning (ML) models can underperform on certain population groups due to choices made during model development and bias inherent in the data. We categorize sources of discrimination in the ML pipeline into two classes: aleatoric discrimination, which is inherent in the data distribution, and epistemic discrimination, which is due to decisions made during model development. We quantify aleatoric discrimination by determining the performance limits of a model under fairness constraints, assuming perfect knowledge of the data distribution. We demonstrate how to characterize aleatoric discrimination by applying Blackwell's results on comparing statistical experiments. We then quantify epistemic discrimination as the gap between a model's accuracy when fairness constraints are applied and the limit posed by aleatoric discrimination. We apply this approach to benchmark existing fairness interventions and investigate fairness risks in data with missing values. Our results indicate that state-of-the-art fairness interventions are effective at removing epistemic discrimination on standard (overused) tabular datasets. However, when data has missing values, there is still significant room for improvement in handling aleatoric discrimination.

Aleatoric and Epistemic Discrimination: Fundamental Limits of Fairness Interventions

TL;DR

The paper tackles algorithmic discrimination by separating data-inherent (aleatoric) biases from model-development (epistemic) choices. It defines the fairness Pareto frontier as the maximum achievable accuracy under group-fairness constraints and characterizes the feasible set of conditional prediction distributions using Blackwell's comparison theorems. A greedy algorithm with convergence guarantees provides an upper-bound approximation to the frontier, enabling benchmarking of existing fairness interventions. Empirical studies across standard tabular datasets show that state-of-the-art (SOTA) fairness methods closely approach the information-theoretic frontier for epistemic discrimination under pristine data, but disparate missing data patterns reveal substantial aleatoric discrimination and reduce intervention effectiveness. The framework offers guidance for data collection and missing-data handling to promote fair and accurate downstream decisions."

Abstract

Machine learning (ML) models can underperform on certain population groups due to choices made during model development and bias inherent in the data. We categorize sources of discrimination in the ML pipeline into two classes: aleatoric discrimination, which is inherent in the data distribution, and epistemic discrimination, which is due to decisions made during model development. We quantify aleatoric discrimination by determining the performance limits of a model under fairness constraints, assuming perfect knowledge of the data distribution. We demonstrate how to characterize aleatoric discrimination by applying Blackwell's results on comparing statistical experiments. We then quantify epistemic discrimination as the gap between a model's accuracy when fairness constraints are applied and the limit posed by aleatoric discrimination. We apply this approach to benchmark existing fairness interventions and investigate fairness risks in data with missing values. Our results indicate that state-of-the-art fairness interventions are effective at removing epistemic discrimination on standard (overused) tabular datasets. However, when data has missing values, there is still significant room for improvement in handling aleatoric discrimination.
Paper Structure (36 sections, 7 theorems, 38 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 36 sections, 7 theorems, 38 equations, 5 figures, 2 tables, 1 algorithm.

Key Result

Lemma 1

The following three conditions are equivalent:

Figures (5)

  • Figure 1: We compare Reduction and FairProjection with (our upper bound estimate of) $\mathsf{FairFront}$ on the Adult (Left) and COMPAS (Right) datasets. We train a classifier that approximates the Bayes optimal and use it as a basis for Reduction and FairProjection. This result not only demonstrates the tightness of our approximation but also shows that SOTA fairness interventions have already achieved near-optimal fairness-accuracy curves.
  • Figure 2: We benchmark existing fairness interventions using (our upper bound estimate of) $\mathsf{FairFront}$. We use $\mathsf{FairFront}$ to quantify aleatoric discrimination and measure epistemic discrimination by comparing a classifier's accuracy and fairness violation with $\mathsf{FairFront}$. The results show that SOTA fairness interventions are effective at reducing epistemic discrimination.
  • Figure 3: Fairness risks of disparate missing patterns. The missing probabilities of group 0 (female in Adult/African-American in COMPAS) and group 1 (male in Adult/Caucasian in COMPAS) are varying among $\{(10\%, 10\%), (50\%, 10\%), (70\%, 10\%)\}$. We apply Reduction and Baseline to the imputed data and plot their fairness-accuracy curves against $\mathsf{FairFront}$. As shown, the effectiveness of fairness interventions substantially decrease with increasing disparate missing patterns in data.
  • Figure 4: We reproduce our experiments on the German Credit dataset. Our observation is consistent with those on the previous two datasets---the fairness-accuracy curves given by SOTA fairness interventions, such as Reduction and FairProjection, are close to the information-theoretic limit.
  • Figure 5: We reproduce our experiments on the HSLS dataset with multi-group and multi-label pre-processing. On the right, we also demonstrate that FairFront can take into account multiple fairness considerations at once. We show how the fairness-accuracy curve changes as we add new types of group fairness constraints (i.e., adding OAE and SP constraints in addition to EO).

Theorems & Definitions (17)

  • Lemma 1: blackwell1951comparisonblackwell1953equivalent
  • Definition 1
  • Definition 2
  • Remark 1
  • Proposition 1
  • Remark 2
  • Theorem 1
  • Theorem 2
  • Lemma 2: Adaptation of Theorem 3 in blackwell1951comparison
  • proof
  • ...and 7 more