Table of Contents
Fetching ...

Learning from Discriminatory Training Data

Przemyslaw A. Grabowicz, Nicholas Perello, Kenta Takatsu

TL;DR

This work reframes fairness in supervised learning as robustness to discriminatory concept shifts that arise when training data contain direct or indirect discrimination. It introduces Optimal Interventional Mixture (OIM), a post-processing approach that probabilistically intervenes on protected attributes to minimize cross-loss on non-discriminatory test data while training on discriminatory data, and extends it with counterfactual mixtures (OCM) to address indirect discrimination. The authors formalize the problem, prove optimality properties under additive discriminatory perturbations, and benchmark OIM against a spectrum of fairness methods on synthetic and real-world datasets, including COMPAS, German Credit, and CelebA. Empirically, OIM achieves strong accuracy with reduced disparities and scales to multiple protected attributes, supporting practical deployment and alignment with legal notions of business necessity and fairness. The work also provides a publicly available FaX-AI library for implementing the proposed methods.

Abstract

Supervised learning systems are trained using historical data and, if the data was tainted by discrimination, they may unintentionally learn to discriminate against protected groups. We propose that fair learning methods, despite training on potentially discriminatory datasets, shall perform well on fair test datasets. Such dataset shifts crystallize application scenarios for specific fair learning methods. For instance, the removal of direct discrimination can be represented as a particular dataset shift problem. For this scenario, we propose a learning method that provably minimizes model error on fair datasets, while blindly training on datasets poisoned with direct additive discrimination. The method is compatible with existing legal systems and provides a solution to the widely discussed issue of protected groups' intersectionality by striking a balance between the protected groups. Technically, the method applies probabilistic interventions, has causal and counterfactual formulations, and is computationally lightweight - it can be used with any supervised learning model to prevent direct and indirect discrimination via proxies while maximizing model accuracy for business necessity.

Learning from Discriminatory Training Data

TL;DR

This work reframes fairness in supervised learning as robustness to discriminatory concept shifts that arise when training data contain direct or indirect discrimination. It introduces Optimal Interventional Mixture (OIM), a post-processing approach that probabilistically intervenes on protected attributes to minimize cross-loss on non-discriminatory test data while training on discriminatory data, and extends it with counterfactual mixtures (OCM) to address indirect discrimination. The authors formalize the problem, prove optimality properties under additive discriminatory perturbations, and benchmark OIM against a spectrum of fairness methods on synthetic and real-world datasets, including COMPAS, German Credit, and CelebA. Empirically, OIM achieves strong accuracy with reduced disparities and scales to multiple protected attributes, supporting practical deployment and alignment with legal notions of business necessity and fairness. The work also provides a publicly available FaX-AI library for implementing the proposed methods.

Abstract

Supervised learning systems are trained using historical data and, if the data was tainted by discrimination, they may unintentionally learn to discriminate against protected groups. We propose that fair learning methods, despite training on potentially discriminatory datasets, shall perform well on fair test datasets. Such dataset shifts crystallize application scenarios for specific fair learning methods. For instance, the removal of direct discrimination can be represented as a particular dataset shift problem. For this scenario, we propose a learning method that provably minimizes model error on fair datasets, while blindly training on datasets poisoned with direct additive discrimination. The method is compatible with existing legal systems and provides a solution to the widely discussed issue of protected groups' intersectionality by striking a balance between the protected groups. Technically, the method applies probabilistic interventions, has causal and counterfactual formulations, and is computationally lightweight - it can be used with any supervised learning model to prevent direct and indirect discrimination via proxies while maximizing model accuracy for business necessity.

Paper Structure

This paper contains 19 sections, 3 theorems, 7 equations, 14 figures, 1 table.

Key Result

Proposition 1

Let the non-discriminatory data have $u=f(\bm{x})+\nu$ and the data following a discriminatory concept shift have $y = f(\bm{x}) + h(\bm{z}) + \nu$, where $f$ and $h$ are some functions and $\nu$ is i.i.d. noise independent from $\bm{X}$ and $Z$. Assume that the same $\ell^p$ loss, either $\ell^1$ o

Figures (14)

  • Figure 1: Training data can be tainted in two ways: individuals belonging to underprivileged groups may be undersampled and, hence, models trained on this data may make larger errors for these groups (B), some of the labels in the training data may be incorrect due to historic discrimination and, hence, models trained on this data may be biased against the underprivileged groups (C). These two dataset issues represent a covariate shift and concept shift, respectively. This paper addresses discriminatory concept shifts.
  • Figure 2: Illustration of the two related goals for fair algorithmic learning, grounded in dataset shifts (top) and explainability literature (bottom). This work focuses on the former, while our prior work focused on the latter.
  • Figure 3: Average resilience to potentially discriminatory concept shifts decreases with the correlation between $X_1$ and $Z$. The coefficient that scales the discrimination in the training data is $\beta=0$ for the case of no discrimination (left) and $\beta=5$ for direct discrimination (right). Each point is an average over 100 random datasets. Error bars show $95\%$ confidence intervals.
  • Figure 4: Average absolute value of SHAP values for $X_1$ and $Z$ as the correlation between $X_1$ and $Z$ increases. Each point is an average over 100 random datasets. Error bars show $95\%$ confidence intervals.
  • Figure 5: Performance of learning algorithms inhibiting discrimination over COMPAS and German Credit datasets. Higher accuracy (ACC) and lower demographic disparity (DD), positive predictive disparity (PPD), and false positive disparity (FPD) are better.
  • ...and 9 more figures

Theorems & Definitions (15)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Example 1
  • Definition 6
  • Example 2
  • Definition 7
  • Example 3
  • ...and 5 more