Table of Contents
Fetching ...

Auditing and Enforcing Conditional Fairness via Optimal Transport

Mohsen Ghassemi, Alan Mishler, Niccolo Dalmasso, Luhao Zhang, Vamsi K. Potluru, Tucker Balch, Manuela Veloso

TL;DR

Novel measures of conditional demographic disparity (CDD) are proposed which rely on statistical distances borrowed from the optimal transport literature and are designed to target conditional demographic parity even when the conditioning variable has many levels.

Abstract

Conditional demographic parity (CDP) is a measure of the demographic parity of a predictive model or decision process when conditioning on an additional feature or set of features. Many algorithmic fairness techniques exist to target demographic parity, but CDP is much harder to achieve, particularly when the conditioning variable has many levels and/or when the model outputs are continuous. The problem of auditing and enforcing CDP is understudied in the literature. In light of this, we propose novel measures of {conditional demographic disparity (CDD)} which rely on statistical distances borrowed from the optimal transport literature. We further design and evaluate regularization-based approaches based on these CDD measures. Our methods, \fairbit{} and \fairlp{}, allow us to target CDP even when the conditioning variable has many levels. When model outputs are continuous, our methods target full equality of the conditional distributions, unlike other methods that only consider first moments or related proxy quantities. We validate the efficacy of our approaches on real-world datasets.

Auditing and Enforcing Conditional Fairness via Optimal Transport

TL;DR

Novel measures of conditional demographic disparity (CDD) are proposed which rely on statistical distances borrowed from the optimal transport literature and are designed to target conditional demographic parity even when the conditioning variable has many levels.

Abstract

Conditional demographic parity (CDP) is a measure of the demographic parity of a predictive model or decision process when conditioning on an additional feature or set of features. Many algorithmic fairness techniques exist to target demographic parity, but CDP is much harder to achieve, particularly when the conditioning variable has many levels and/or when the model outputs are continuous. The problem of auditing and enforcing CDP is understudied in the literature. In light of this, we propose novel measures of {conditional demographic disparity (CDD)} which rely on statistical distances borrowed from the optimal transport literature. We further design and evaluate regularization-based approaches based on these CDD measures. Our methods, \fairbit{} and \fairlp{}, allow us to target CDP even when the conditioning variable has many levels. When model outputs are continuous, our methods target full equality of the conditional distributions, unlike other methods that only consider first moments or related proxy quantities. We validate the efficacy of our approaches on real-world datasets.

Paper Structure

This paper contains 39 sections, 4 theorems, 38 equations, 4 figures, 13 tables, 1 algorithm.

Key Result

Proposition 4.2

Consider a model $f:\mathcal{X} \to \mathcal{Y}$. Let $\mathcal{L}$ be the support of the legitimate feature $L$. Let $\underline d= \min_{l,l'\in \mathcal{L}} \|l-l'\|_p^p$ denote the minimum distance between two levels of the legitimate features. Moreover, let $\bar{d} = \max_{x,x'\in \mathcal{X}

Figures (4)

  • Figure 1: Conditional demographic disparity ($\mathsf {CDD}$) versus $R_{\text{DCFR}}$ (left) and the value of FairBiT regularizer (right) in a synthetic loan setting, varying the proportion of males vs. females and the loan acceptance rates. The proportion of males vs. females controls the slope of the CDD-$R_{\text{DCFR}}$ curve, with higher ratios yielding steeper curves. A 45 degree line only occurs if the ratio of males to females is 1:1.
  • Figure 2: Fairness-predictive power trade-offs and Pareto frontiers for the four real datasets. Top row: Trade-offs when measuring fairness using a $\mathsf {CDD}^{\mathsf {wass}}$ (we use a 'normalized" version for presentation purposes. See Appendix \ref{['sec:app:exp_details']} for details). Bottom row: Trade-offs when measuring fairness using $\mathsf {CDD}^{\ell_p}$ (with $\mathbb Q(L) = \mathbb U(L)$ and $p=1$). Predictive power (PP) is measured by AUC for classification tasks (first and second columns from left) and MSE for regression tasks (third and fourth columns from left). Results are averaged over 10 runs, with multiple points per model due indicating different hyper-parameter values. Overall, FairBiT and the variants of FairLeap are consistently part of the Pareto frontier, hence providing better fairness-PP trade-offs than many of the other proposed methods. See text and Appendix \ref{['sec:app:exp_details']} for more details.
  • Figure 3: Fairness-predictive power trade-offs for Adult, Drug, Communities and Crime, and LawSchool datasets. The figures in the top row present the results when fairness is measured by $\mathsf {CDD}^{\mathsf {wass}}_{f}$. The results when fairness metric is $\mathsf {CDD}^{\mathsf {\ell_p}}_{f}$ (with $\mathbb Q(L) = \mathbb U(L)$ and $p=1$) are presented in the bottom row. Predictive power (PP) is measured by AUC for Classification and MSE for regression. Results are averaged over 10 runs, with different values for the same methods due to different hyper-parameter settings; see Appendix \ref{['sec:app:exp_details']} for details. These figures include legit-only as a reference. Legit-only, unsurprisingly, achieves full conditional parity but significantly suffers in terms of predictive performance.
  • Figure 4: Fairness-performance trade-offs for Adult, Drug, Communities and Crime, and LawSchool datasets. In the top row, we show the results when fairness is measured by $\mathsf {CDD}^{\mathsf {\ell_p}}_{f}$ with $\mathbb Q(L) = \mathbb{P}(L)$ and $p=1$. The figures in the middle row show the results when fairness is measured by $\mathsf {CDD}^{\mathsf {\ell_p}}_{f}$ with $\mathbb Q(L) = \frac{\mathbb{P}(L|A=0)+\mathbb{P}(L|A=1)}{2}$ and $p=1$. The figures in the middle row show the results when fairness is measured by demographic disparity (DD), specifically using the 1-Wasserstein distance. Predictive power is measured by AUC for Classification and MSE for regression. Results are averaged over 10 runs, with different values for the same methods due to different hyper-parameter settings; see Appendix \ref{['sec:app:exp_details']} for details. Overall, when fairness is measured by CDD metrics, FairBiT and the variants of FairLeap are consistently among the highest performing, often providing better fairness-predictive power trade-offs than the other proposed methods. When fairness is measured by DD, unsurprisingly Wasserstein Reg. performs the best. In this case, DCFR has good performance especially on regression datasets with some points on the frontier. FairBiT and the variants of FairLeap generally do not improve demographic disparity by much, although they still have many points on the Pareto frontier in classification.

Theorems & Definitions (19)

  • Definition 2.1: Demographic parity (DP)
  • Definition 2.2: Conditional demographic parity (CDP)
  • Definition 3.1: CDD in the Wasserstein sense
  • Definition 3.2: CDD in the $\ell_p$ sense
  • Definition 4.1: Bi-causal transport distance (BCD)
  • Proposition 4.2
  • Proposition 5.1
  • Remark 5.2
  • Definition A.1: Causal and bi-causal transport plans
  • Definition A.2: Bi-causal transport distance (BCD)
  • ...and 9 more