FairWASP: Fast and Optimal Fair Wasserstein Pre-processing

Zikai Xiong; Niccolò Dalmasso; Alan Mishler; Vamsi K. Potluru; Tucker Balch; Manuela Veloso

FairWASP: Fast and Optimal Fair Wasserstein Pre-processing

Zikai Xiong, Niccolò Dalmasso, Alan Mishler, Vamsi K. Potluru, Tucker Balch, Manuela Veloso

TL;DR

This work presents FairWASP, a novel pre-processing approach designed to reduce disparities in classification datasets without modifying the original data, and shows theoretically that integer weights are optimal, which means the method can be equivalently understood as duplicating or eliminating samples.

Abstract

Recent years have seen a surge of machine learning approaches aimed at reducing disparities in model outputs across different subgroups. In many settings, training data may be used in multiple downstream applications by different users, which means it may be most effective to intervene on the training data itself. In this work, we present FairWASP, a novel pre-processing approach designed to reduce disparities in classification datasets without modifying the original data. FairWASP returns sample-level weights such that the reweighted dataset minimizes the Wasserstein distance to the original dataset while satisfying (an empirical version of) demographic parity, a popular fairness criterion. We show theoretically that integer weights are optimal, which means our method can be equivalently understood as duplicating or eliminating samples. FairWASP can therefore be used to construct datasets which can be fed into any classification method, not just methods which accept sample weights. Our work is based on reformulating the pre-processing task as a large-scale mixed-integer program (MIP), for which we propose a highly efficient algorithm based on the cutting plane method. Experiments demonstrate that our proposed optimization algorithm significantly outperforms state-of-the-art commercial solvers in solving both the MIP and its linear program relaxation. Further experiments highlight the competitive performance of FairWASP in reducing disparities while preserving accuracy in downstream classification settings.

FairWASP: Fast and Optimal Fair Wasserstein Pre-processing

TL;DR

Abstract

Paper Structure (29 sections, 7 theorems, 35 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 29 sections, 7 theorems, 35 equations, 4 figures, 1 table, 1 algorithm.

Introduction
Contributions
Background
Setup
Demographic Parity (DP)
Pre-processing via Reweighting
Wasserstein Distance
FairWASP Optimization Problem
Reformulations of the Optimization Problem
Step 1: Reformulating \ref{['pro general optimization framework']} as a MIP
Step 2: Dual Problem of the LP Relaxation
Step 3: Using the Dual Solution to Solve the Original MIP
Cutting Plane Method for the Reformulated Problem
Comparison with Other LP Algorithms
FairWASP-PW: Extension to Pairwise Demographic Parity Constraints
...and 14 more sections

Key Result

Lemma 1

For function $G(\bar{C})\stackrel{\text{ def. }}{=} \max_{P \in S_n} \langle \bar{C} , P\rangle$, it is a convex function of $\bar{C}$ in $\mathbb{R}^{n\times n}$. It has the following function value and subgradient. For each $i\in [n]$, let $\bar{c}_{ij_i^\star}$ denote a largest component on the and then $P^\star \in \arg\max_{P \in S_n} \langle \bar{C} , P\rangle$ and $P^\star \in \partia

Figures (4)

Figure 1: Speed comparison with commercial solvers. FairWASP has significantly better runtime and scalability.
Figure 2: Downstream fairness-utility tradeoff, indicated by the demographic disparity and downstream classifier area under the curve (AUC). The x-axis refers to the absolute difference in the mean classifier outcome for the two groups, with a value of 0 corresponding to perfect demographic parity. Points and error bars correspond to averages plus/minus one standard deviation, computed over 10 different train/test split. FairWASP and FairWASP-PW consistently provide one of the best tradeoffs, significantly improving over using the original dataset as-is. See text and Supplementary Material C for more details.
Figure 3: Relative objective gaps and fairness violations of the FairWASP solutions in the experiments in Figure \ref{['fig:runtime']}
Figure 4: Downstream fairness-utility tradeoff, indicated by the demographic parity and downstream classifier area under the curve (AUC). Points and error bars correspond to averages plus/minus one standard deviation, computed over 10 different train/test split. Each point is a method-parameter combination. See text for more details.

Theorems & Definitions (15)

Lemma 1
proof : Proof Sketch
Corollary 2
proof
Theorem 3
proof : Proof Sketch
Lemma 4
Corollary 5
Lemma 6
proof : Proof Sketch
...and 5 more

FairWASP: Fast and Optimal Fair Wasserstein Pre-processing

TL;DR

Abstract

FairWASP: Fast and Optimal Fair Wasserstein Pre-processing

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (15)