Table of Contents
Fetching ...

Recovering from Biased Data: Can Fairness Constraints Improve Accuracy?

Avrim Blum, Kevin Stangl

TL;DR

The paper addresses learning from biased training data by examining fairness constraints as a tool to improve accuracy. It shows that enforcing Equal Opportunity within ERM provably recovers the Bayes Optimal Classifier under wide Under-Representation and Labeling Bias models, while other notions such as Equalized Odds, Demographic Parity, and Calibration have restricted or harmful behavior in these settings. Re-Weighting can help in some bias regimes, but may fail under combined biases, underscoring that the choice of fairness intervention must align with the data-generation process. The results highlight that fairness constraints can enhance true accuracy on the uncorrupted distribution, offering a practical route to robust decision-making in the presence of biased data.

Abstract

Multiple fairness constraints have been proposed in the literature, motivated by a range of concerns about how demographic groups might be treated unfairly by machine learning classifiers. In this work we consider a different motivation; learning from biased training data. We posit several ways in which training data may be biased, including having a more noisy or negatively biased labeling process on members of a disadvantaged group, or a decreased prevalence of positive or negative examples from the disadvantaged group, or both. Given such biased training data, Empirical Risk Minimization (ERM) may produce a classifier that not only is biased but also has suboptimal accuracy on the true data distribution. We examine the ability of fairness-constrained ERM to correct this problem. In particular, we find that the Equal Opportunity fairness constraint (Hardt, Price, and Srebro 2016) combined with ERM will provably recover the Bayes Optimal Classifier under a range of bias models. We also consider other recovery methods including reweighting the training data, Equalized Odds, and Demographic Parity. These theoretical results provide additional motivation for considering fairness interventions even if an actor cares primarily about accuracy.

Recovering from Biased Data: Can Fairness Constraints Improve Accuracy?

TL;DR

The paper addresses learning from biased training data by examining fairness constraints as a tool to improve accuracy. It shows that enforcing Equal Opportunity within ERM provably recovers the Bayes Optimal Classifier under wide Under-Representation and Labeling Bias models, while other notions such as Equalized Odds, Demographic Parity, and Calibration have restricted or harmful behavior in these settings. Re-Weighting can help in some bias regimes, but may fail under combined biases, underscoring that the choice of fairness intervention must align with the data-generation process. The results highlight that fairness constraints can enhance true accuracy on the uncorrupted distribution, offering a practical route to robust decision-making in the presence of biased data.

Abstract

Multiple fairness constraints have been proposed in the literature, motivated by a range of concerns about how demographic groups might be treated unfairly by machine learning classifiers. In this work we consider a different motivation; learning from biased training data. We posit several ways in which training data may be biased, including having a more noisy or negatively biased labeling process on members of a disadvantaged group, or a decreased prevalence of positive or negative examples from the disadvantaged group, or both. Given such biased training data, Empirical Risk Minimization (ERM) may produce a classifier that not only is biased but also has suboptimal accuracy on the true data distribution. We examine the ability of fairness-constrained ERM to correct this problem. In particular, we find that the Equal Opportunity fairness constraint (Hardt, Price, and Srebro 2016) combined with ERM will provably recover the Bayes Optimal Classifier under a range of bias models. We also consider other recovery methods including reweighting the training data, Equalized Odds, and Demographic Parity. These theoretical results provide additional motivation for considering fairness interventions even if an actor cares primarily about accuracy.

Paper Structure

This paper contains 18 sections, 6 theorems, 29 equations, 3 figures, 1 table.

Key Result

Theorem 4.1

Assume true labels are generated by $P_{\mathscr{D},r}(h^*, \eta)$ corrupted by both Under-Representation bias and Labeling bias with parameters $\beta_{POS}, \beta_{NEG},\nu$, and assume that Then $h^*=(h_{A}^{*}, h_{B}^{*})$ is the lowest biased error classifier satisfying Equality of Opportunity on the biased training distribution and thus $h^*$ is recovered by Equal Opportunity constrained ER

Figures (3)

  • Figure 1: The schematic on the left displays data points with $p=1/2$, $h^{*}_{B}$ as a hyperplane, and $\eta=1/3$. The schematic on the right displays data drawn from the same distribution subject to the Under-Representation Bias with $\beta_{POS}=1/3$. Now there are more negative examples than positive examples above the hyperplane so the lowest error hypothesis classifies all examples on the right as negative.
  • Figure 2: This figure indicates the parameter region such that Equal Opportunity Constrained ERM recovers $h^*$ under the Under-Representation Bias Model and is a visualization of Equation \ref{['Equal Opportunityineq']}. $r=1/3$ and $p=1/2$. We label each pair $(\eta, \beta)$ with blue if it satisfies the inequality and red otherwise. This plot shows how smaller $\eta$ means we can recover from lower $\beta$. Blue means $h^*$ is recovered. The dashed black line indicates the boundary between recovering $h^*$ and failing to recover $h^*$.
  • Figure 3: Differences between $h_B$ and $h_{B}^*$ measured with probabilities in the true data distribution (before the effects of the bias model).

Theorems & Definitions (13)

  • Definition 2.1
  • Definition 2.2
  • Theorem 4.1
  • Lemma 4.2
  • proof
  • Lemma 4.3
  • proof
  • Lemma 4.4
  • proof
  • Lemma 4.5
  • ...and 3 more