Table of Contents
Fetching ...

Consistent End-to-End Estimation for Counterfactual Fairness

Yuchen Ma, Valentyn Melnychuk, Dennis Frauen, Stefan Feuerriegel

TL;DR

This work tackles counterfactual fairness by learning the counterfactual distribution of mediators using a Generative Counterfactual Fairness Network (GCFN). It combines a tailored GAN (Step 1) to generate counterfactual mediators with a counterfactual mediator regularization (Step 2) that enforces fairness at prediction time, and it provides theoretical guarantees for identifiability and consistency under bijective generation mechanisms. Empirically, GCFN achieves state-of-the-art performance on (semi-)synthetic data and demonstrates practical fairness improvements on real-world datasets like UCI Adult and COMPAS, while offering a controllable accuracy-fairness trade-off via a fairness weight $\lambda$. The approach advances counterfactual fairness by delivering identifiability and guarantees, addressing core weaknesses of latent-variable baselines, and showing practical impact across multiple domains with a streamlined GAN-based framework.

Abstract

Fairness in predictions is of direct importance in practice due to legal, ethical, and societal reasons. This is often accomplished through counterfactual fairness, which ensures that the prediction for an individual is the same as that in a counterfactual world under a different sensitive attribute. However, achieving counterfactual fairness is challenging as counterfactuals are unobservable, and, because of that, existing baselines for counterfactual fairness do not have theoretical guarantees. In this paper, we propose a novel counterfactual fairness predictor for making predictions under counterfactual fairness. Here, we follow the standard counterfactual fairness setting and directly learn the counterfactual distribution of the descendants of the sensitive attribute via tailored neural networks, which we then use to enforce fair predictions through a novel counterfactual mediator regularization. Unique to our work is that we provide theoretical guarantees that our method is effective in ensuring the notion of counterfactual fairness. We further compare the performance across various datasets, where our method achieves state-of-the-art performance.

Consistent End-to-End Estimation for Counterfactual Fairness

TL;DR

This work tackles counterfactual fairness by learning the counterfactual distribution of mediators using a Generative Counterfactual Fairness Network (GCFN). It combines a tailored GAN (Step 1) to generate counterfactual mediators with a counterfactual mediator regularization (Step 2) that enforces fairness at prediction time, and it provides theoretical guarantees for identifiability and consistency under bijective generation mechanisms. Empirically, GCFN achieves state-of-the-art performance on (semi-)synthetic data and demonstrates practical fairness improvements on real-world datasets like UCI Adult and COMPAS, while offering a controllable accuracy-fairness trade-off via a fairness weight . The approach advances counterfactual fairness by delivering identifiability and guarantees, addressing core weaknesses of latent-variable baselines, and showing practical impact across multiple domains with a streamlined GAN-based framework.

Abstract

Fairness in predictions is of direct importance in practice due to legal, ethical, and societal reasons. This is often accomplished through counterfactual fairness, which ensures that the prediction for an individual is the same as that in a counterfactual world under a different sensitive attribute. However, achieving counterfactual fairness is challenging as counterfactuals are unobservable, and, because of that, existing baselines for counterfactual fairness do not have theoretical guarantees. In this paper, we propose a novel counterfactual fairness predictor for making predictions under counterfactual fairness. Here, we follow the standard counterfactual fairness setting and directly learn the counterfactual distribution of the descendants of the sensitive attribute via tailored neural networks, which we then use to enforce fair predictions through a novel counterfactual mediator regularization. Unique to our work is that we provide theoretical guarantees that our method is effective in ensuring the notion of counterfactual fairness. We further compare the performance across various datasets, where our method achieves state-of-the-art performance.
Paper Structure (50 sections, 5 theorems, 45 equations, 18 figures, 3 tables, 1 algorithm)

This paper contains 50 sections, 5 theorems, 45 equations, 18 figures, 3 tables, 1 algorithm.

Key Result

Lemma 1

Let the observational distribution $\mathbb{P}_{X,A,M} = \mathbb{P}_\mathrm{f}$ be induced by an SCM $\mathcal{M} =\langle \mathbf{V}, \mathbf{U}, \mathcal{F}, \mathbb{P}(\mathbf{U}) \rangle$ with and with the causal graph as in Figure fig:causal_graph. Let $\mathcal{M} \subseteq \mathbb{R}$ and $f_M$ be a bijective generation mechanism (BGM) nasr2023counterfactualmelnychuk2023partial, i.e., $f_M

Figures (18)

  • Figure 1: Causal graph. The nodes represent: sensitive attribute, $A$; covariate, $X$; mediator, $M$; target, $Y$. $\longrightarrow$ represents direct causal effect; $\dashleftarrow \dashrightarrow$ represents potential presence of hidden confounders.
  • Figure 2: Overview of our GCFN for consistent estimation.Step 1: A deterministic generator $G$ takes $\left(X, A, M \right)$ as input and outputs $\hat{M}_A$ and $\hat{M}_{A'}$. A discriminator ${D}$ then differentiates the observed factual mediator ${M}$ from the generated counterfactual mediator $\hat{M}_{A'}$. We train $s$ different generator-discriminator pairs and consider the worst-case counterfactual fairness. Step 2: We then use generated counterfactual mediator $\hat{M}_{A'}$ in our counterfactual mediator regularization $\mathcal{R}_\mathrm{{cm}}$. We take the supremum of $\mathcal{R}_\mathrm{{cm}}$ to choose the most 'unfair' generator. Therefore, we enforce the worst-case counterfactual fairness for the prediction model $h(x, m)$.
  • Figure 3: Results for LSAC dataset with two different data-generating mechanisms. A larger utility is better. Shown: mean $\pm$ std over 5 runs.
  • Figure 4: Density of the predicted target variable (salary) across male vs. female. Left: w/o our $\mathcal{R}_\mathrm{{cm}}$. Right: w/ our $\mathcal{R}_\mathrm{{cm}}$.
  • Figure 5: Trade-off between accuracy (ACC) and counterfactual fairness (CF) across different $\lambda$. ACC: the higher ($\uparrow$) the better. CF: the lower ($\downarrow$) the better.
  • ...and 13 more figures

Theorems & Definitions (12)

  • Lemma 1: Consistent estimation of the counterfactual distribution with GAN (up to a measure-preserving indeterminacy)
  • proof
  • Remark 1
  • Lemma 2: Counterfactual mediator regularization bound
  • proof
  • Lemma 3: Consistent estimation of the counterfactual distribution with GAN (up to a measure-preserving indeterminacy)
  • proof
  • Corollary 1
  • proof
  • Remark 2
  • ...and 2 more