Understanding Fairness Surrogate Functions in Algorithmic Fairness
Wei Yao, Zhanke Zhou, Zhicong Li, Bo Han, Yong Liu
TL;DR
This paper analyzes the reliability of fairness surrogate functions for demographic parity (DP) in algorithmic decision systems. It formalizes the surrogate-fairness gap between the DP definition and common surrogate-based constraints, and shows that unbounded surrogates can lead to instability and poor DP satisfaction, especially due to large margin points. To remedy this, the authors propose a general sigmoid surrogate $G(x)=\sigma(wx)$ with $w>0$, which bounds the surrogate values, tightens fairness guarantees, and reduces variance, along with a theoretical upper bound: if $G(D_\theta(\mathbf{x}))\in[1-\gamma,1]$ and $|\tilde{DDP}_{\mathcal{S}}(G)|\le\epsilon$, then $|\widehat{DDP}_{\mathcal{S}}|\le\tfrac{1}{2}\epsilon+\gamma$. They further introduce Balanced Surrogates, an iterative method that adjusts a balance factor between groups to minimize the gap during training. Empirical evaluation on Adult, Bank Marketing, and COMPAS datasets shows that the general sigmoid surrogate improves fairness and stability while preserving accuracy, and that Balanced Surrogates enhance unbounded surrogates’ performance. Overall, the work provides both theoretical and algorithmic tools to better align surrogate-based fairness with the underlying DP definition and to mitigate issues arising from large-margin points and data imbalance.
Abstract
It has been observed that machine learning algorithms exhibit biased predictions against certain population groups. To mitigate such bias while achieving comparable accuracy, a promising approach is to introduce surrogate functions of the concerned fairness definition and solve a constrained optimization problem. However, it is intriguing in previous work that such fairness surrogate functions may yield unfair results and high instability. In this work, in order to deeply understand them, taking a widely used fairness definition--demographic parity as an example, we show that there is a surrogate-fairness gap between the fairness definition and the fairness surrogate function. Also, the theoretical analysis and experimental results about the gap motivate us that the fairness and stability will be affected by the points far from the decision boundary, which is the large margin points issue investigated in this paper. To address it, we propose the general sigmoid surrogate to simultaneously reduce both the surrogate-fairness gap and the variance, and offer a rigorous fairness and stability upper bound. Interestingly, the theory also provides insights into two important issues that deal with the large margin points as well as obtaining a more balanced dataset are beneficial to fairness and stability. Furthermore, we elaborate a novel and general algorithm called Balanced Surrogate, which iteratively reduces the gap to mitigate unfairness. Finally, we provide empirical evidence showing that our methods consistently improve fairness and stability while maintaining accuracy comparable to the baselines in three real-world datasets.
