Table of Contents
Fetching ...

Counterfactual Fairness by Combining Factual and Counterfactual Predictions

Zeyu Zhou, Tianci Liu, Ruqi Bai, Jing Gao, Murat Kocaoglu, David I. Inouye

TL;DR

This work analyzes Counterfactual Fairness (CF) within invertible causal models and shows that the Bayes-optimal predictor under CF can be constructed by mixing predictions at factual and counterfactual endpoints, yielding an optimal CF solution with predictable excess risk. It derives explicit excess-risk bounds for regression and classification tasks, tying them to the dependency between the target $Y$ and the sensitive attribute $A$ through the latent structure. To handle incomplete causal knowledge, the authors propose a Plug-in Counterfactual Fairness (PCF) approach and a Counterfactual Risk Minimization (CRM) strategy, providing guarantees when the counterfactual generator is estimated. Empirical results on synthetic and semi-synthetic data validate the theoretical insights, showing that PCF-based methods can outperform existing CF techniques in both perfect and imperfect counterfactual information regimes, with practical guidance on when and how to apply CRM. The work highlights the fundamental fairness-utility trade-off in CF and discusses limitations and future directions for deploying CF methods with real-world counterfactuals and pretrained predictors.

Abstract

In high-stake domains such as healthcare and hiring, the role of machine learning (ML) in decision-making raises significant fairness concerns. This work focuses on Counterfactual Fairness (CF), which posits that an ML model's outcome on any individual should remain unchanged if they had belonged to a different demographic group. Previous works have proposed methods that guarantee CF. Notwithstanding, their effects on the model's predictive performance remains largely unclear. To fill in this gap, we provide a theoretical study on the inherent trade-off between CF and predictive performance in a model-agnostic manner. We first propose a simple but effective method to cast an optimal but potentially unfair predictor into a fair one without losing the optimality. By analyzing its excess risk in order to achieve CF, we quantify this inherent trade-off. Further analysis on our method's performance with access to only incomplete causal knowledge is also conducted. Built upon it, we propose a performant algorithm that can be applied in such scenarios. Experiments on both synthetic and semi-synthetic datasets demonstrate the validity of our analysis and methods.

Counterfactual Fairness by Combining Factual and Counterfactual Predictions

TL;DR

This work analyzes Counterfactual Fairness (CF) within invertible causal models and shows that the Bayes-optimal predictor under CF can be constructed by mixing predictions at factual and counterfactual endpoints, yielding an optimal CF solution with predictable excess risk. It derives explicit excess-risk bounds for regression and classification tasks, tying them to the dependency between the target and the sensitive attribute through the latent structure. To handle incomplete causal knowledge, the authors propose a Plug-in Counterfactual Fairness (PCF) approach and a Counterfactual Risk Minimization (CRM) strategy, providing guarantees when the counterfactual generator is estimated. Empirical results on synthetic and semi-synthetic data validate the theoretical insights, showing that PCF-based methods can outperform existing CF techniques in both perfect and imperfect counterfactual information regimes, with practical guidance on when and how to apply CRM. The work highlights the fundamental fairness-utility trade-off in CF and discusses limitations and future directions for deploying CF methods with real-world counterfactuals and pretrained predictors.

Abstract

In high-stake domains such as healthcare and hiring, the role of machine learning (ML) in decision-making raises significant fairness concerns. This work focuses on Counterfactual Fairness (CF), which posits that an ML model's outcome on any individual should remain unchanged if they had belonged to a different demographic group. Previous works have proposed methods that guarantee CF. Notwithstanding, their effects on the model's predictive performance remains largely unclear. To fill in this gap, we provide a theoretical study on the inherent trade-off between CF and predictive performance in a model-agnostic manner. We first propose a simple but effective method to cast an optimal but potentially unfair predictor into a fair one without losing the optimality. By analyzing its excess risk in order to achieve CF, we quantify this inherent trade-off. Further analysis on our method's performance with access to only incomplete causal knowledge is also conducted. Built upon it, we propose a performant algorithm that can be applied in such scenarios. Experiments on both synthetic and semi-synthetic datasets demonstrate the validity of our analysis and methods.
Paper Structure (42 sections, 7 theorems, 47 equations, 12 figures, 1 algorithm)

This paper contains 42 sections, 7 theorems, 47 equations, 12 figures, 1 algorithm.

Key Result

Lemma 3.2

Given asm:main, predictor $\phi$ on $(X,A)$ is counterfactually fair if and only if the predictor returns the same value for a sample and its counterfactuals, i.e., $\mathrm{TE}(\phi)=0 \Leftrightarrow \phi(x,a) \overset{\text{a.s.}}{=} \phi(x_{1-a}, 1-a), \quad\forall (x,a)$.

Figures (12)

  • Figure 1: The optimal (unfair) predictor (a) violates counterfactual fairness in the middle region because the predictions are different for the factual-counterfactual pairs (denoted by line segments between $a=0$ and $a=1$). We prove that the optimal fair predictor (b) simply mixes the optimal unfair predictions at the factual and counterfactual points (i.e., mixes the predictions at both endpoints of the line). This mixing incurs the inherent excess risk associated with counterfactual fairness. Colors represent target classes ($Y$), and dot styles represent sensitive attributes ($A$).
  • Figure 2: Causal graph. $A$ represents sensitive attribute, $Y$ represents the target variable, $U$ represents latent confounders, $X$ represents observed features. Note that the validity of our theoretical analysis holds for all causal models that satisfy the condition given by \ref{['asm:main']}. It is not restricted to this specific graph.
  • Figure 3: Results on synthetic datasets given ground truth counterfactuals.
  • Figure 4: Results on synthetic datasets under counterfactual estimation error. Different color represents different $\alpha$ indicating the standard deviation of the error ($\epsilon \sim \mathcal{N}(0,\alpha)$) while shape represents different algorithms. Results with different $\beta$ can be found in \ref{['app-sec:exp-result']}.
  • Figure 5: Results on synthetic datasets comparing PCF and PCF-Analytic. Different color represents different $\alpha$ indicating the standard deviation of the error ($\epsilon \sim \mathcal{N}(0,\alpha)$) while shape represents different algorithms. Results with different $\beta$ can be found in \ref{['app-sec:exp-result']}.
  • ...and 7 more figures

Theorems & Definitions (17)

  • Definition 2.1
  • Definition 2.2
  • Lemma 3.2
  • Theorem 3.3
  • Theorem 3.4
  • Proposition 3.5
  • Theorem 3.6
  • Remark 3.7
  • proof : Proof of \ref{['thm:perfect-te']}
  • Lemma A.1: Optimal Predictor is Conditional Mean
  • ...and 7 more