Consistent End-to-End Estimation for Counterfactual Fairness
Yuchen Ma, Valentyn Melnychuk, Dennis Frauen, Stefan Feuerriegel
TL;DR
This work tackles counterfactual fairness by learning the counterfactual distribution of mediators using a Generative Counterfactual Fairness Network (GCFN). It combines a tailored GAN (Step 1) to generate counterfactual mediators with a counterfactual mediator regularization (Step 2) that enforces fairness at prediction time, and it provides theoretical guarantees for identifiability and consistency under bijective generation mechanisms. Empirically, GCFN achieves state-of-the-art performance on (semi-)synthetic data and demonstrates practical fairness improvements on real-world datasets like UCI Adult and COMPAS, while offering a controllable accuracy-fairness trade-off via a fairness weight $\lambda$. The approach advances counterfactual fairness by delivering identifiability and guarantees, addressing core weaknesses of latent-variable baselines, and showing practical impact across multiple domains with a streamlined GAN-based framework.
Abstract
Fairness in predictions is of direct importance in practice due to legal, ethical, and societal reasons. This is often accomplished through counterfactual fairness, which ensures that the prediction for an individual is the same as that in a counterfactual world under a different sensitive attribute. However, achieving counterfactual fairness is challenging as counterfactuals are unobservable, and, because of that, existing baselines for counterfactual fairness do not have theoretical guarantees. In this paper, we propose a novel counterfactual fairness predictor for making predictions under counterfactual fairness. Here, we follow the standard counterfactual fairness setting and directly learn the counterfactual distribution of the descendants of the sensitive attribute via tailored neural networks, which we then use to enforce fair predictions through a novel counterfactual mediator regularization. Unique to our work is that we provide theoretical guarantees that our method is effective in ensuring the notion of counterfactual fairness. We further compare the performance across various datasets, where our method achieves state-of-the-art performance.
