Towards Counterfactual Fairness-aware Domain Generalization in Changing Environments
Yujie Lin, Chen Zhao, Minglai Shao, Baoluo Meng, Xujiang Zhao, Haifeng Chen
TL;DR
This work tackles domain generalization under evolving environments while preserving counterfactual fairness. It introduces DCFDG, a causal-disentangled framework that partitions exogenous information into four latent variables $U_s$, $U_{ns}$, $U_{v1}$, and $U_{v2}$, with temporal priors to capture domain evolution. The model optimizes a joint objective combining an ELBO-based reconstruction term, a counterfactual fairness loss $\mathcal{L}_{f}$, and an adversarial loss $\mathcal{L}_{TC}$ to disentangle semantic information from sensitive and environment-specific factors, with a theoretical KL-bound supporting cross-domain training. Empirical results on synthetic FairCircle and real-world Adult and Chicago Crime datasets show improved accuracy and stronger fairness guarantees across unseen domain sequences, highlighting the practical impact for fair, robust learning in changing environments.
Abstract
Recognizing the prevalence of domain shift as a common challenge in machine learning, various domain generalization (DG) techniques have been developed to enhance the performance of machine learning systems when dealing with out-of-distribution (OOD) data. Furthermore, in real-world scenarios, data distributions can gradually change across a sequence of sequential domains. While current methodologies primarily focus on improving model effectiveness within these new domains, they often overlook fairness issues throughout the learning process. In response, we introduce an innovative framework called Counterfactual Fairness-Aware Domain Generalization with Sequential Autoencoder (CDSAE). This approach effectively separates environmental information and sensitive attributes from the embedded representation of classification features. This concurrent separation not only greatly improves model generalization across diverse and unfamiliar domains but also effectively addresses challenges related to unfair classification. Our strategy is rooted in the principles of causal inference to tackle these dual issues. To examine the intricate relationship between semantic information, sensitive attributes, and environmental cues, we systematically categorize exogenous uncertainty factors into four latent variables: 1) semantic information influenced by sensitive attributes, 2) semantic information unaffected by sensitive attributes, 3) environmental cues influenced by sensitive attributes, and 4) environmental cues unaffected by sensitive attributes. By incorporating fairness regularization, we exclusively employ semantic information for classification purposes. Empirical validation on synthetic and real-world datasets substantiates the effectiveness of our approach, demonstrating improved accuracy levels while ensuring the preservation of fairness in the evolving landscape of continuous domains.
