Learning from Discriminatory Training Data
Przemyslaw A. Grabowicz, Nicholas Perello, Kenta Takatsu
TL;DR
This work reframes fairness in supervised learning as robustness to discriminatory concept shifts that arise when training data contain direct or indirect discrimination. It introduces Optimal Interventional Mixture (OIM), a post-processing approach that probabilistically intervenes on protected attributes to minimize cross-loss on non-discriminatory test data while training on discriminatory data, and extends it with counterfactual mixtures (OCM) to address indirect discrimination. The authors formalize the problem, prove optimality properties under additive discriminatory perturbations, and benchmark OIM against a spectrum of fairness methods on synthetic and real-world datasets, including COMPAS, German Credit, and CelebA. Empirically, OIM achieves strong accuracy with reduced disparities and scales to multiple protected attributes, supporting practical deployment and alignment with legal notions of business necessity and fairness. The work also provides a publicly available FaX-AI library for implementing the proposed methods.
Abstract
Supervised learning systems are trained using historical data and, if the data was tainted by discrimination, they may unintentionally learn to discriminate against protected groups. We propose that fair learning methods, despite training on potentially discriminatory datasets, shall perform well on fair test datasets. Such dataset shifts crystallize application scenarios for specific fair learning methods. For instance, the removal of direct discrimination can be represented as a particular dataset shift problem. For this scenario, we propose a learning method that provably minimizes model error on fair datasets, while blindly training on datasets poisoned with direct additive discrimination. The method is compatible with existing legal systems and provides a solution to the widely discussed issue of protected groups' intersectionality by striking a balance between the protected groups. Technically, the method applies probabilistic interventions, has causal and counterfactual formulations, and is computationally lightweight - it can be used with any supervised learning model to prevent direct and indirect discrimination via proxies while maximizing model accuracy for business necessity.
