Fair Domain Generalization: An Information-Theoretic View
Tangzheng Lian, Guanyu Hu, Dimitrios Kollias, Xinyu Yang, Oya Celiktutan
TL;DR
This work tackles Fair Domain Generalization (FairDG), seeking to minimize both risk on unseen target domains and fairness violations under domain shifts. It provides novel mutual-information–based upper bounds for target risk and Equalized Odds violations, and then develops a practical Pareto-optimized framework (PAFDG) that learns domain- and group-invariant representations. The method uses differentiable dependence measures (distance correlation) on learned encodings and trains via a two-stage process to yield a Pareto front of utility–fairness trade-offs, with an efficient lambda-conditioned training strategy. Experiments on CelebA, AffectNet, and Jigsaw demonstrate superior Pareto fronts and single-solution performance compared to DG and fairness baselines, highlighting scalability to multi-class, multi-group fairness under distribution shifts.
Abstract
Domain generalization (DG) and algorithmic fairness are two critical challenges in machine learning. However, most DG methods focus only on minimizing expected risk in the unseen target domain without considering algorithmic fairness. Conversely, fairness methods typically do not account for domain shifts, so the fairness achieved during training may not generalize to unseen test domains. In this work, we bridge these gaps by studying the problem of Fair Domain Generalization (FairDG), which aims to minimize both expected risk and fairness violations in unseen target domains. We derive novel mutual information-based upper bounds for expected risk and fairness violations in multi-class classification tasks with multi-group sensitive attributes. These bounds provide key insights for algorithm design from an information-theoretic perspective. Guided by these insights, we introduce PAFDG (Pareto-Optimal Fairness for Domain Generalization), a practical framework that solves the FairDG problem and models the utility-fairness trade-off through Pareto optimization. Experiments on real-world vision and language datasets show that PAFDG achieves superior utility-fairness trade-offs compared to existing methods.
