Table of Contents
Fetching ...

Multi-Output Distributional Fairness via Post-Processing

Gang Li, Qihang Lin, Ayush Ghosh, Tianbao Yang

TL;DR

The paper tackles fairness for multi-output models by enforcing distributional parity across groups through post-processing. It generalizes single-output Wasserstein-barycenter post-processing to multi-output settings, using optimal transport to move outputs toward an empirical barycenter and introducing a computationally efficient approximate barycenter plus kernel-based out-of-sample extension. A controllable parameter $\alpha\in[0,1]$ trades off predictive fidelity against fairness, yielding a Pareto frontier between distortion and distributional parity. Empirical results across multi-label, multi-class, and representation-learning tasks show improved joint-output fairness with competitive accuracy, demonstrating the practical viability of task-agnostic, post-processing fairness for complex outputs.

Abstract

The post-processing approaches are becoming prominent techniques to enhance machine learning models' fairness because of their intuitiveness, low computational cost, and excellent scalability. However, most existing post-processing methods are designed for task-specific fairness measures and are limited to single-output models. In this paper, we introduce a post-processing method for multi-output models, such as the ones used for multi-task/multi-class classification and representation learning, to enhance a model's distributional parity, a task-agnostic fairness measure. Existing methods for achieving distributional parity rely on the (inverse) cumulative density function of a model's output, restricting their applicability to single-output models. Extending previous works, we propose to employ optimal transport mappings to move a model's outputs across different groups towards their empirical Wasserstein barycenter. An approximation technique is applied to reduce the complexity of computing the exact barycenter and a kernel regression method is proposed to extend this process to out-of-sample data. Our empirical studies evaluate the proposed approach against various baselines on multi-task/multi-class classification and representation learning tasks, demonstrating the effectiveness of the proposed approach.

Multi-Output Distributional Fairness via Post-Processing

TL;DR

The paper tackles fairness for multi-output models by enforcing distributional parity across groups through post-processing. It generalizes single-output Wasserstein-barycenter post-processing to multi-output settings, using optimal transport to move outputs toward an empirical barycenter and introducing a computationally efficient approximate barycenter plus kernel-based out-of-sample extension. A controllable parameter trades off predictive fidelity against fairness, yielding a Pareto frontier between distortion and distributional parity. Empirical results across multi-label, multi-class, and representation-learning tasks show improved joint-output fairness with competitive accuracy, demonstrating the practical viability of task-agnostic, post-processing fairness for complex outputs.

Abstract

The post-processing approaches are becoming prominent techniques to enhance machine learning models' fairness because of their intuitiveness, low computational cost, and excellent scalability. However, most existing post-processing methods are designed for task-specific fairness measures and are limited to single-output models. In this paper, we introduce a post-processing method for multi-output models, such as the ones used for multi-task/multi-class classification and representation learning, to enhance a model's distributional parity, a task-agnostic fairness measure. Existing methods for achieving distributional parity rely on the (inverse) cumulative density function of a model's output, restricting their applicability to single-output models. Extending previous works, we propose to employ optimal transport mappings to move a model's outputs across different groups towards their empirical Wasserstein barycenter. An approximation technique is applied to reduce the complexity of computing the exact barycenter and a kernel regression method is proposed to extend this process to out-of-sample data. Our empirical studies evaluate the proposed approach against various baselines on multi-task/multi-class classification and representation learning tasks, demonstrating the effectiveness of the proposed approach.
Paper Structure (20 sections, 4 theorems, 49 equations, 3 figures, 2 algorithms)

This paper contains 20 sections, 4 theorems, 49 equations, 3 figures, 2 algorithms.

Key Result

Theorem 1

Suppose $\nu_{f^*|s}$ has density and finite second moments for each $s \in \mathcal{S}$. Then Moreover, if $f_0$ and $\nu_0$ solve the first and second minimization in eqn:char, respectively, then $\nu_0$ is the distribution of $f_0$ and where $T_{f^*|s,\nu_0}:\mathbb R^k \rightarrow \mathbb R^k$ is the optimal transport mapping from $\nu_{f^*|s}$ to $\nu_0$.

Figures (3)

  • Figure 1: Suppose the original model's outputs (left) minimize prediction error and have a good performance, one can balance performance and fairness by applying our method with setting $\alpha = 0.5$ (middle) or achieve exact fairness by setting $\alpha = 0$ (right).
  • Figure 2: Multi-label classification on CelebA dataset (a) and Chexpert dataset (b); (c) Multi-class classification on Customer dataset.
  • Figure 3: t-SNE visualization of representations on CelebA dataset. (a) Raw representations from an SSL model; (b) Representations after post-processing($\alpha=0$) by hu2023fairness; (c) Representations after post-processing($\alpha=0$) with TAB(Ours); (d) Performance on downstream tasks.

Theorems & Definitions (9)

  • Definition 1
  • Definition 2
  • Definition 3
  • Theorem 1
  • Proposition 1
  • Proposition 2
  • Lemma 1
  • Proof
  • Proof