Metrizing Fairness

Yves Rychener; Bahar Taskesen; Daniel Kuhn

Metrizing Fairness

Yves Rychener, Bahar Taskesen, Daniel Kuhn

TL;DR

Conditions under which hard SP constraints are guaranteed to improve predictive accuracy are identified and it is proved that the unfairness-regularized prediction loss admits unbiased gradient estimators, which are constructed from random mini-batches of training samples, if unfairness is measured by the squared $\mathcal L^2$-distance or by a squared maximum mean discrepancy.

Abstract

We study supervised learning problems that have significant effects on individuals from two demographic groups, and we seek predictors that are fair with respect to a group fairness criterion such as statistical parity (SP). A predictor is SP-fair if the distributions of predictions within the two groups are close in Kolmogorov distance, and fairness is achieved by penalizing the dissimilarity of these two distributions in the objective function of the learning problem. In this paper, we identify conditions under which hard SP constraints are guaranteed to improve predictive accuracy. We also showcase conceptual and computational benefits of measuring unfairness with integral probability metrics (IPMs) other than the Kolmogorov distance. Conceptually, we show that the generator of any IPM can be interpreted as a family of utility functions and that unfairness with respect to this IPM arises if individuals in the two demographic groups have diverging expected utilities. We also prove that the unfairness-regularized prediction loss admits unbiased gradient estimators, which are constructed from random mini-batches of training samples, if unfairness is measured by the squared $\mathcal L^2$-distance or by a squared maximum mean discrepancy. In this case, the fair learning problem is susceptible to efficient stochastic gradient descent (SGD) algorithms. Numerical experiments on synthetic and real data show that these SGD algorithms outperform state-of-the-art methods for fair learning in that they achieve superior accuracy-unfairness trade-offs -- sometimes orders of magnitude faster.

Metrizing Fairness

TL;DR

-distance or by a squared maximum mean discrepancy.

Abstract

-distance or by a squared maximum mean discrepancy. In this case, the fair learning problem is susceptible to efficient stochastic gradient descent (SGD) algorithms. Numerical experiments on synthetic and real data show that these SGD algorithms outperform state-of-the-art methods for fair learning in that they achieve superior accuracy-unfairness trade-offs -- sometimes orders of magnitude faster.

Paper Structure (28 sections, 17 theorems, 70 equations, 7 figures, 8 tables)

This paper contains 28 sections, 17 theorems, 70 equations, 7 figures, 8 tables.

Introduction
Contributions.
Related work.
Notation.
Fairness in Supervised Learning
Unfairness Measures and Integral Probability Metrics
Numerical Solution of Fair Learning Problems
Empirical Risk Minimization
Stochastic Approximation
Numerical Experiments
Online Learning
Regression
Classification
Offline Learning
Regression
...and 13 more sections

Key Result

Proposition 2.3

Suppose that $\Omega$ is finite, $\mathcal{H}=\mathcal{L}(\mathcal{X},\mathbb{R})$, and eq:loss-min has a minimizer $h^\star$ that is $\mathbb{P}$-almost surely unique. If $\mathbb{P}_{Y|X}\perp A$, then $h^\star(X)\perp A$.

Figures (7)

Figure 1: Proof of Theorem \ref{['thm:hard-fairness-accuracy']}: The right chart zooms into the neighborhood of $h^\star_0$. The red (gray) lines represent the contours of the unbiased (biased) objective functions.
Figure 2: Test loss of the optimal regressors output by the biased and unbiased MFL methods for target batch sizes $\bar{N}\in\{4,50\}$ as a function of the number of training samples
Figure 3: Impact of the target batch size $\Bar{N}$ (color-coded) on the means (dots) and std. errors (error bars) of the accuracy and the SP-unfairness of the trained classifiers on test data
Figure 4: $R^2$ vs SP-unfairness on test data for regression tasks averaged over 10 simulations
Figure 5: Accuracy vs SP-unfairness on test data for neural network-based (\ref{['fig:drug']}--\ref{['fig:adult']}) and linear (\ref{['fig:drug-linear']}--\ref{['fig:adult-linear']}) classification tasks averaged over 10 simulations
...and 2 more figures

Theorems & Definitions (42)

Definition 2.1: Group-fairness
Example 2.2: Enforcing SP can reduce accuracy and fairness
Proposition 2.3: Optimality implies SP
Example 2.4: $Y\perp A$ does not imply SP
Proposition 2.5: Geometry of $\mathcal{H}_{\rm fair}$
Example 2.6: Training on biased data
Theorem 2.7: SP improves accuracy
Remark 2.8: Loss functions
Remark 2.9: Bias models
Definition 3.1: Integral probability metric
...and 32 more

Metrizing Fairness

TL;DR

Abstract

Metrizing Fairness

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (42)