Table of Contents
Fetching ...

Mind the Graph When Balancing Data for Fairness or Robustness

Jessica Schrouff, Alexis Bellot, Amal Rannen-Triki, Alan Malek, Isabela Albuquerque, Arthur Gretton, Alexander D'Amour, Silvia Chiappa

TL;DR

The paper analyzes when data balancing to remove dependencies among covariates, outcomes, and auxiliary factors yields fair or robust models, using a causal Bayesian network framework and a joint balancing operator ${Q}$ to relate ${P^t}$ and ${Q}$. It derives sufficient conditions under which balancing yields risk-invariance and optimality across target distributions ${ m P}$, and shows that balancing is not simply equivalent to removing causal edges, as ${Q}$ need not factorize according to the altered graph. Through semi-synthetic MNIST and Amazon reviews experiments and a CelebA case study, it demonstrates that balancing can both improve and degrade fairness/robustness depending on the task and graph, and that balancing can interact unfavorably with regularization. The work emphasizes using the task's causal graph to guide mitigation choices and provides diagnostic guidance for distinguishing failure modes when applying data balancing.

Abstract

Failures of fairness or robustness in machine learning predictive settings can be due to undesired dependencies between covariates, outcomes and auxiliary factors of variation. A common strategy to mitigate these failures is data balancing, which attempts to remove those undesired dependencies. In this work, we define conditions on the training distribution for data balancing to lead to fair or robust models. Our results display that, in many cases, the balanced distribution does not correspond to selectively removing the undesired dependencies in a causal graph of the task, leading to multiple failure modes and even interference with other mitigation techniques such as regularization. Overall, our results highlight the importance of taking the causal graph into account before performing data balancing.

Mind the Graph When Balancing Data for Fairness or Robustness

TL;DR

The paper analyzes when data balancing to remove dependencies among covariates, outcomes, and auxiliary factors yields fair or robust models, using a causal Bayesian network framework and a joint balancing operator to relate and . It derives sufficient conditions under which balancing yields risk-invariance and optimality across target distributions , and shows that balancing is not simply equivalent to removing causal edges, as need not factorize according to the altered graph. Through semi-synthetic MNIST and Amazon reviews experiments and a CelebA case study, it demonstrates that balancing can both improve and degrade fairness/robustness depending on the task and graph, and that balancing can interact unfavorably with regularization. The work emphasizes using the task's causal graph to guide mitigation choices and provides diagnostic guidance for distinguishing failure modes when applying data balancing.

Abstract

Failures of fairness or robustness in machine learning predictive settings can be due to undesired dependencies between covariates, outcomes and auxiliary factors of variation. A common strategy to mitigate these failures is data balancing, which attempts to remove those undesired dependencies. In this work, we define conditions on the training distribution for data balancing to lead to fair or robust models. Our results display that, in many cases, the balanced distribution does not correspond to selectively removing the undesired dependencies in a causal graph of the task, leading to multiple failure modes and even interference with other mitigation techniques such as regularization. Overall, our results highlight the importance of taking the causal graph into account before performing data balancing.

Paper Structure

This paper contains 38 sections, 15 theorems, 38 equations, 10 figures, 4 tables.

Key Result

Proposition 4.2

If $X^{\perp}_Z \mathrel{\perp\mspace{-10mu}\perp}_{Q} Z \,|\, Y$ and $X^{\perp}_Z$ is a sufficient statistic for $Y$ in ${Q}$, then the risk-minimizer $f(X):=\mathop{\mathrm{\mathbb{E}}}\nolimits_{Q} [ Y \,|\, X]$ is risk-invariant and optimal w.r.t. $\mathcal{P}$.

Figures (10)

  • Figure 1: MNIST data samples.
  • Figure 2: Accuracy across different values of the MMD hyper-parameter for models trained on balanced data and evaluated on their respective training distribution (dashed) and ${P^0}$ (solid line) averaged across replicates. We consider anti-causal tasks: (left) purely spurious case, (middle) when another confounder $V$ is present, and (right) the entangled dataset. Worst group performance on ${P^0}$ is displayed in red. Markers display individual replicates.
  • Figure 3: Accuracy across different values of the confounder strength (i.e. different $P' \in \mathcal{P}$), for each value of MMD regularization considered (displayed by the color gradient). (a) Models trained on ${P^t}$. (b) Models trained on ${Q}$. Results are averaged across seeds for clarity. Notice the different y-scales. (c) Displays the mean and standard deviation across seeds for MMD=16.
  • Figure 4: Model performance on test sets sampled from ${P^t}$ (dotted) and ${Q}$ (dashed). The model is trained on ${P^t}$ with regularization $f(X) \mathrel{\perp\mspace{-10mu}\perp} Z \mid Y$.
  • Figure 5: Proportions of $Y={0,1}$ (grey bars) and $Z={0,1}$ (purple bars) before (left) and after (right) balancing the data on $Y$.
  • ...and 5 more figures

Theorems & Definitions (40)

  • Definition 1.1: Jointly balanced distribution
  • Definition 2.1: Optimality
  • Definition 2.2: Risk-invariance
  • Definition 4.1
  • Proposition 4.2
  • Corollary 4.3
  • Proposition 4.4
  • Proposition 5.1
  • Proposition A.1
  • proof : Proof of Proposition \ref{['prop:fm1']}.
  • ...and 30 more