Table of Contents
Fetching ...

The Fragility of Fairness: Causal Sensitivity Analysis for Fair Machine Learning

Jake Fawkes, Nic Fishman, Mel Andrews, Zachary C. Lipton

TL;DR

This work adapts tools from causal sensitivity analysis to the FairML context, providing a general framework which accommodates effectively any combination of fairness metric and bias that can be posed in the "oblivious setting", and shows that causal sensitivity analysis provides a powerful and necessary toolkit for gauging the informativeness of parity metric evaluations.

Abstract

Fairness metrics are a core tool in the fair machine learning literature (FairML), used to determine that ML models are, in some sense, "fair". Real-world data, however, are typically plagued by various measurement biases and other violated assumptions, which can render fairness assessments meaningless. We adapt tools from causal sensitivity analysis to the FairML context, providing a general framework which (1) accommodates effectively any combination of fairness metric and bias that can be posed in the "oblivious setting"; (2) allows researchers to investigate combinations of biases, resulting in non-linear sensitivity; and (3) enables flexible encoding of domain-specific constraints and assumptions. Employing this framework, we analyze the sensitivity of the most common parity metrics under 3 varieties of classifier across 14 canonical fairness datasets. Our analysis reveals the striking fragility of fairness assessments to even minor dataset biases. We show that causal sensitivity analysis provides a powerful and necessary toolkit for gauging the informativeness of parity metric evaluations. Our repository is available here: https://github.com/Jakefawkes/fragile_fair.

The Fragility of Fairness: Causal Sensitivity Analysis for Fair Machine Learning

TL;DR

This work adapts tools from causal sensitivity analysis to the FairML context, providing a general framework which accommodates effectively any combination of fairness metric and bias that can be posed in the "oblivious setting", and shows that causal sensitivity analysis provides a powerful and necessary toolkit for gauging the informativeness of parity metric evaluations.

Abstract

Fairness metrics are a core tool in the fair machine learning literature (FairML), used to determine that ML models are, in some sense, "fair". Real-world data, however, are typically plagued by various measurement biases and other violated assumptions, which can render fairness assessments meaningless. We adapt tools from causal sensitivity analysis to the FairML context, providing a general framework which (1) accommodates effectively any combination of fairness metric and bias that can be posed in the "oblivious setting"; (2) allows researchers to investigate combinations of biases, resulting in non-linear sensitivity; and (3) enables flexible encoding of domain-specific constraints and assumptions. Employing this framework, we analyze the sensitivity of the most common parity metrics under 3 varieties of classifier across 14 canonical fairness datasets. Our analysis reveals the striking fragility of fairness assessments to even minor dataset biases. We show that causal sensitivity analysis provides a powerful and necessary toolkit for gauging the informativeness of parity metric evaluations. Our repository is available here: https://github.com/Jakefawkes/fragile_fair.

Paper Structure

This paper contains 74 sections, 3 theorems, 13 equations, 16 figures, 2 tables.

Key Result

Proposition 1

So long as any additional unobserved variables $U^{\prime}$ satisfy the following: Then marginalizing over $U^{\prime}$ will lead to the same graph as Fig. fig:proxy_struc.

Figures (16)

  • Figure 1: Example of a DAG and the corresponding SCM. Unobserved variables are dashed.
  • Figure 2: Causal graphs for each of the biases showing the assumed causal structure over all variables, and the implied structure upon marginalizing out $X$. Dashed lines denote varying assumptions.
  • Figure 3: In this we directly recreate the plots from fogliato2020fairness for a predictor trained on the COMPAS dataset, allowing for some probabilistic and causal assumptions to vary. The dashed lines represent exact bounds on each statistic for increasing $P(Y_P \neq Y)$, which follow from fogliato2020fairness or our derivations in Appendix \ref{['ap:proxy_identification_results']}. (a) represents the original setting, where we have $P(Y_P=1 \mid Y=0) = 0$, in (b) we drop the dashed edge between $X$ and $Y_P$ in the causal graph in Fig. \ref{['fig:proxy_struc']}, and finally for (c) we instead take $P(Y_P=0 \mid Y=1) = 0$. As we can see, at all points we query, the automatically derived bounds recover the algebraically derived bounds.
  • Figure 4: Combination of Proxy and Selection Bias for an equalized odds predictor on the Adult dataset.
  • Figure 5: Results of our cross-dataset study, in which we assess the sensitivity of multiple ML predictors trained to satisfy various parity constraints on the fairness benchmarking datasets listed in Appendix \ref{['ap:cross_dataset_analysis']}. We can see different metrics are susceptible to bias in different ways, with notably demographic parity being more robust than more complicated, outcome-dependent metrics.
  • ...and 11 more figures

Theorems & Definitions (7)

  • Definition 1
  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Proposition 3
  • proof