On the (In)Compatibility between Group Fairness and Individual Fairness

Shizhou Xu; Thomas Strohmer

On the (In)Compatibility between Group Fairness and Individual Fairness

Shizhou Xu, Thomas Strohmer

TL;DR

The paper analyzes when group fairness via statistical parity can coexist with individual fairness, focusing on post-processing for $L^2$-loss and a Wasserstein-disparity-based Pareto frontier. It proves an intrinsic incompatibility between the optimal statistical-parity learning and uniform Lipschitz IF, while providing a concrete, verifiable condition under which $(\epsilon,\delta)$-IF compatibility holds, and characterizes how much of the Pareto frontier remains compatible. It then establishes composition guarantees for combining a trained model with post-processing steps to maintain IF guarantees, and demonstrates these concepts experimentally on LSAC and CRIME datasets using affine Wasserstein barycenters and Pareto-frontier estimation. The practical impact is a principled framework to balance utility and fairness, enabling practitioners to select Pareto-optimal post-processing strategies that respect individual fairness constraints. The findings offer actionable guidance for deploying fair post-processing pipelines with provable compatibility guarantees.

Abstract

We study the compatibility between the optimal statistical parity solutions and individual fairness. While individual fairness seeks to treat similar individuals similarly, optimal statistical parity aims to provide similar treatment to individuals who share relative similarity within their respective sensitive groups. The two fairness perspectives, while both desirable from a fairness perspective, often come into conflict in applications. Our goal in this work is to analyze the existence of this conflict and its potential solution. In particular, we establish sufficient (sharp) conditions for the compatibility between the optimal (post-processing) statistical parity $L^2$ learning and the ($K$-Lipschitz or $(ε,δ)$) individual fairness requirements. Furthermore, when there exists a conflict between the two, we first relax the former to the Pareto frontier (or equivalently the optimal trade-off) between $L^2$ error and statistical disparity, and then analyze the compatibility between the frontier and the individual fairness requirements. Our analysis identifies regions along the Pareto frontier that satisfy individual fairness requirements. (Lastly, we provide individual fairness guarantees for the composition of a trained model and the optimal post-processing step so that one can determine the compatibility of the post-processed model.) This provides practitioners with a valuable approach to attain Pareto optimality for statistical parity while adhering to the constraints of individual fairness.

On the (In)Compatibility between Group Fairness and Individual Fairness

TL;DR

The paper analyzes when group fairness via statistical parity can coexist with individual fairness, focusing on post-processing for

-loss and a Wasserstein-disparity-based Pareto frontier. It proves an intrinsic incompatibility between the optimal statistical-parity learning and uniform Lipschitz IF, while providing a concrete, verifiable condition under which

-IF compatibility holds, and characterizes how much of the Pareto frontier remains compatible. It then establishes composition guarantees for combining a trained model with post-processing steps to maintain IF guarantees, and demonstrates these concepts experimentally on LSAC and CRIME datasets using affine Wasserstein barycenters and Pareto-frontier estimation. The practical impact is a principled framework to balance utility and fairness, enabling practitioners to select Pareto-optimal post-processing strategies that respect individual fairness constraints. The findings offer actionable guidance for deploying fair post-processing pipelines with provable compatibility guarantees.

Abstract

learning and the (

-Lipschitz or

) individual fairness requirements. Furthermore, when there exists a conflict between the two, we first relax the former to the Pareto frontier (or equivalently the optimal trade-off) between

error and statistical disparity, and then analyze the compatibility between the frontier and the individual fairness requirements. Our analysis identifies regions along the Pareto frontier that satisfy individual fairness requirements. (Lastly, we provide individual fairness guarantees for the composition of a trained model and the optimal post-processing step so that one can determine the compatibility of the post-processed model.) This provides practitioners with a valuable approach to attain Pareto optimality for statistical parity while adhering to the constraints of individual fairness.

Paper Structure (28 sections, 9 theorems, 59 equations, 3 figures, 1 table)

This paper contains 28 sections, 9 theorems, 59 equations, 3 figures, 1 table.

Introduction
Related Work and Contribution
Generalized Individual Fairness Definitions
Problem Setting
Statistical Parity Enhanced by Utility Optimization
Individual Fairness as an Additional Constraint
Preliminaries on the (Pareto) Optimal Fair $L^2$ Learning
Quantification of Statistical Disparity
Optimal Statistical Disparity $L^2$ Learning and the Pareto Frontier
Compatibility between the Optimal Statistical Parity $L^2$ Learning and Individual Fairness
Optimal Statistical Parity $L^2$ Learning and Lipschitz-IF
Optimal Fair $L^2$ Learning and $(\epsilon,\delta)$-IF
Compatibility between Pareto Frontier and $(\epsilon,\delta)$-IF
Composition Results
Empirical Study: Fair Supervised Learning
...and 13 more sections

Key Result

Lemma 2.2

Assume that $\hat{Y}$ has sensitive conditional distributions satisfying $\{\mathcal{L}(\hat{Y}_z)\}_{z \in \mathcal{Z}} =: \{\mu_z\}_{z \in \mathcal{Z}} \subset \mathcal{P}_{2,ac}(\mathcal{Y})$, then there exists a unique $f^* \in L^2(\mathcal{Y} \times \mathcal{Z},\mathcal{Y})$ defined by for $\lambda$-a.e. $z \in \mathcal{Z}$ such that

Figures (3)

Figure 1: As shown in the univariate regression test on LSAC above, all three rows consist of the $L^2$ loss and Wasserstein disparity of the original prediction (LR or ANN), the prediction using data excluding $Z$ (LR or ANN + Excluding Z), the exact post-processing $\mathcal{W}_2$ barycenter via cumulative distribution functions (cdfs) matching approach (LR or ANN + chzhen2020fair), the optimal affine estimation of the post-processing $\mathcal{W}_2$ barycenter (LR or ANN + post-proc. Pseudo-barycenter), the Pareto frontier estimated by the optimal affine maps (LR or ANN + post-proc. Pareto Est.), and finally the portion of the estimated Pareto frontier that is compatible with the corresponding $(\epsilon,\delta)$-IF constraints. Here, $L(f^*) = 0.959$ for linear regression prediction and $L(f^*) = 1.250$ for ANN prediction. For each $(\epsilon,\delta)$-IF constraint, the compatible portion is the first $\frac{\delta-\epsilon}{2L(f^*)}$ part of the Pareto frontier. More generally, each percentage increase in $\frac{\delta-\epsilon}{2L(f^*)}$ results in one percentage larger portion of the Pareto frontier to be compatible. Also, the portion is guaranteed to satisfy$(\epsilon, \epsilon + (\delta - \epsilon))$-IF for all $\epsilon \in [0,\infty)$
Figure 2: As shown in the univariate regression test on CRIME above, all three rows consist of the $L^2$ loss and Wasserstein disparity of the original prediction (LR or ANN), the prediction using data excluding $Z$ (LR or ANN + Excluding Z), the exact post-processing $\mathcal{W}_2$ barycenter via cdfs matching approach (LR or ANN + chzhen2020fair), the optimal affine estimation of the post-processing $\mathcal{W}_2$ barycenter (LR or ANN + post-proc. Pseudo-barycenter), the Pareto frontier estimated by the optimal affine maps (LR or ANN + post-proc. Pareto Est.), and finally the portion of the estimated Pareto frontier that is compatible with the corresponding $(\epsilon,\delta)$-IF constraints. Here, $L(f^*) = 1.045$ for linear regression and $L(f^*) = 1.385$ for ANN prediction. Each percentage increase in $\frac{\delta-\epsilon}{2L(f^*)}$ results in one percentage larger portion of the Pareto frontier to be compatible.
Figure 3: In the multivariate regression test on CRIME above, all three rows consist of the $L^2$ loss and Wasserstein disparity of the original prediction (LR or ANN), the prediction using data excluding $Z$ (LR or ANN + Excluding Z), the optimal affine estimation of the post-processing $\mathcal{W}_2$ barycenter (LR or ANN + post-proc. Pseudo-barycenter), the Pareto frontier estimated by the optimal affine maps (LR or ANN + post-proc. Pareto Est.), and finally the portion of the estimated Pareto frontier that is compatible with the corresponding $(\epsilon,\delta)$-IF constraints. Notice that excluding $Z$ now removes only limited Wasserstein disparity due to the multidimensional dependent variable. Here, $L(f^*) = 3.396$ for linear regression and $L(f^*) = 4.434$ for ANN prediction. Each percentage increase in $\frac{\delta-\epsilon}{2L(f^*)}$ results in one percentage larger portion of the Pareto frontier to be compatible.

Theorems & Definitions (25)

Definition 1.1: Statistical parity
Remark 1.1: Fairness through awareness
Definition 1.2: Uniform K-Lipschitz-IF
Definition 1.3: Uniform $(\epsilon,\delta)$-IF
Remark 1.2: Main difference between $(\epsilon,\delta)$-IF and K-Lipschitz-IF
Definition 1.4: $(\epsilon,\delta)$-IF constrained admissible set
Remark 1.3: Choice of the admissible set
Remark 1.4: Compatibility analysis for pre-processing
Remark 1.5: Relative vs absolute similarity
Example 1.1: College admission
...and 15 more

On the (In)Compatibility between Group Fairness and Individual Fairness

TL;DR

Abstract

On the (In)Compatibility between Group Fairness and Individual Fairness

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (25)