Table of Contents
Fetching ...

Fair Wasserstein Coresets

Zikai Xiong, Niccolò Dalmasso, Shubham Sharma, Freddy Lecue, Daniele Magazzeni, Vamsi K. Potluru, Tucker Balch, Manuela Veloso

TL;DR

This work presents fair Wasserstein coresets (FWC), a novel coreset approach which generates fair synthetic representative samples along with sample-level weights to be used in downstream learning tasks and achieves a competitive fairness-utility tradeoff in downstream models compared to existing approaches.

Abstract

Data distillation and coresets have emerged as popular approaches to generate a smaller representative set of samples for downstream learning tasks to handle large-scale datasets. At the same time, machine learning is being increasingly applied to decision-making processes at a societal level, making it imperative for modelers to address inherent biases towards subgroups present in the data. While current approaches focus on creating fair synthetic representative samples by optimizing local properties relative to the original samples, their impact on downstream learning processes has yet to be explored. In this work, we present fair Wasserstein coresets (FWC), a novel coreset approach which generates fair synthetic representative samples along with sample-level weights to be used in downstream learning tasks. FWC uses an efficient majority minimization algorithm to minimize the Wasserstein distance between the original dataset and the weighted synthetic samples while enforcing demographic parity. We show that an unconstrained version of FWC is equivalent to Lloyd's algorithm for k-medians and k-means clustering. Experiments conducted on both synthetic and real datasets show that FWC: (i) achieves a competitive fairness-utility tradeoff in downstream models compared to existing approaches, (ii) improves downstream fairness when added to the existing training data and (iii) can be used to reduce biases in predictions from large language models (GPT-3.5 and GPT-4).

Fair Wasserstein Coresets

TL;DR

This work presents fair Wasserstein coresets (FWC), a novel coreset approach which generates fair synthetic representative samples along with sample-level weights to be used in downstream learning tasks and achieves a competitive fairness-utility tradeoff in downstream models compared to existing approaches.

Abstract

Data distillation and coresets have emerged as popular approaches to generate a smaller representative set of samples for downstream learning tasks to handle large-scale datasets. At the same time, machine learning is being increasingly applied to decision-making processes at a societal level, making it imperative for modelers to address inherent biases towards subgroups present in the data. While current approaches focus on creating fair synthetic representative samples by optimizing local properties relative to the original samples, their impact on downstream learning processes has yet to be explored. In this work, we present fair Wasserstein coresets (FWC), a novel coreset approach which generates fair synthetic representative samples along with sample-level weights to be used in downstream learning tasks. FWC uses an efficient majority minimization algorithm to minimize the Wasserstein distance between the original dataset and the weighted synthetic samples while enforcing demographic parity. We show that an unconstrained version of FWC is equivalent to Lloyd's algorithm for k-medians and k-means clustering. Experiments conducted on both synthetic and real datasets show that FWC: (i) achieves a competitive fairness-utility tradeoff in downstream models compared to existing approaches, (ii) improves downstream fairness when added to the existing training data and (iii) can be used to reduce biases in predictions from large language models (GPT-3.5 and GPT-4).
Paper Structure (37 sections, 10 theorems, 42 equations, 7 figures, 8 tables, 1 algorithm)

This paper contains 37 sections, 10 theorems, 42 equations, 7 figures, 8 tables, 1 algorithm.

Key Result

Proposition 2.1

Let $g_\psi \in \mathcal{G}^K$ be the class of $K$-layer multilayer perceptrons with ReLu activations. Then, the downstream discrepancy in downstream performance of $g_\psi$ applied to samples from $p_{(\widehat{X}, \widehat{D}); \theta}$ and $p_{(X, D);e}$ is bounded by the 1-Wasserstein distance : where $L_k$ is the MLP Lipschitz constant upper bound defined in virmaux2018lipschitz.

Figures (7)

  • Figure 1: Top left:FWC runtime when changing the original dataset size $n$. Others: Fairness-utility tradeoff on real datasets for a downstream MLP classifier, selecting the model with the best fairness-utility tradeoff across three different coreset sizes $m$, with averages taken over 10 runs. FWC consistently achieves a comparable/better tradeoff as shown by the Pareto frontier (dashed red line, computed over all models and coreset sizes), even when adjusting the other coresets with a fairness pre-processing technique [33]. See text and Appendix C.2 for more details.
  • Figure 2: Runtime analysis of FWC when varying the size of the coreset $m$ (left) and the dimensionality of the features $p$ (right). We report averages and one standard deviation computed over 10 runs.
  • Figure 3: Fairness-utility tradeoff of downstream MLP classifier trained using the original training set augmented with coresets representatives, following the augmentation strategy from sharma2020data. Each point shows the best model in terms of fairness-utility tradeoff over various degrees of data augmentation, in addition to the baseline model with no augmentation. Means and standard deviations taken over 10 runs, with the computed Pareto frontier indicated by the dashed red line.
  • Figure 4: Fairness-utility tradeoff of downstream MLP classifier trained using the original training set augmented with coresets representatives, following the augmentation strategy from sharma2020data, including all methods mentioned in Section \ref{['sec: experiments']}. Each point shows the best model in terms of fairness-utility tradeoff over various degrees of data augmentation, in addition to the baseline model with no augmentation. Averages and standard deviations computed over 10 runs, with the top panel showing just means and the bottom panel combining both means and standard deviations, with the computed Pareto frontier indicated by the dashed red line.
  • Figure 5: Data augmentation fairness-utility tradeoff of downstream MLP classifier for the Drug dataset when the protected attribute $D$ (gender) is either not included (left) or included (right) as feature in the learning process. As in Figure \ref{['fig: supp-aug-fwc-std']}, the best model across various data augmentation degrees is reported, with averages and standard deviations obtained over 10 runs. FWC manages to successfully reduce the demographic disparity when gender is not used as a feature, but fail to do so when gender is used, indicating that gender provides strong predictive power for the outcome in question, which would require enforcing fairness either during model training or by post-processing the outputs.
  • ...and 2 more figures

Theorems & Definitions (18)

  • Proposition 2.1
  • Lemma 3.1
  • Lemma 5.1
  • Lemma 5.2
  • Theorem 5.3
  • Theorem 5.4
  • Proposition 5.5
  • Lemma A.1
  • proof
  • Lemma A.2: Essentially Corollary 5 of Xiong2023FairWASP
  • ...and 8 more