Wasserstein projection distance for fairness testing of regression models
Wanxin Li, Yongjin P. Park, Khanh Dao Duc
TL;DR
This work extends Wasserstein-projection fairness testing from classification to regression, focusing on expectation-based fairness criteria. It formulates a hypothesis test and an optimal data-perturbation mechanism to push models toward fairness while balancing accuracy, underpinned by a dual reformulation and asymptotic theory with a chi-square limiting paradigm. The proposed framework is validated on synthetic data and real cases (student grades and housing prices), demonstrating higher specificity than permutation tests and the ability to detect and mitigate biases. Together, these contributions provide a principled, transport-based toolkit for auditing and correcting fairness in continuous-valued predictive models with practical impact for education and housing analytics.
Abstract
Fairness in machine learning is a critical concern, yet most research has focused on classification tasks, leaving regression models underexplored. This paper introduces a Wasserstein projection-based framework for fairness testing in regression models, focusing on expectation-based criteria. We propose a hypothesis-testing approach and an optimal data perturbation method to improve fairness while balancing accuracy. Theoretical results include a detailed categorization of fairness criteria for regression, a dual reformulation of the Wasserstein projection test statistic, and the derivation of asymptotic bounds and limiting distributions. Experiments on synthetic and real-world datasets demonstrate that the proposed method offers higher specificity compared to permutation-based tests, and effectively detects and mitigates biases in real applications such as student performance and housing price prediction.
