Table of Contents
Fetching ...

Fairness in Multi-Task Learning via Wasserstein Barycenters

François Hu, Philipp Ratz, Arthur Charpentier

TL;DR

This work tackles fairness in a two-task learning setting with a shared representation under Demographic Parity. It reframes DP fairness as an optimal transport problem using Wasserstein-2 barycenters and derives a closed-form, post-processing fair predictor for both regression and binary classification tasks. A data-driven plug-in estimator, leveraging a labeled training set and an unlabeled pool within a You Only Train Once (YOTO) framework, enables practical deployment across arbitrary MT models. Empirical results on folktables and COMPAS datasets demonstrate substantial unfairness reduction with modest degradation in predictive performance, highlighting the method's scalability and applicability for fair decision-making in multi-task contexts.

Abstract

Algorithmic Fairness is an established field in machine learning that aims to reduce biases in data. Recent advances have proposed various methods to ensure fairness in a univariate environment, where the goal is to de-bias a single task. However, extending fairness to a multi-task setting, where more than one objective is optimised using a shared representation, remains underexplored. To bridge this gap, we develop a method that extends the definition of Strong Demographic Parity to multi-task learning using multi-marginal Wasserstein barycenters. Our approach provides a closed form solution for the optimal fair multi-task predictor including both regression and binary classification tasks. We develop a data-driven estimation procedure for the solution and run numerical experiments on both synthetic and real datasets. The empirical results highlight the practical value of our post-processing methodology in promoting fair decision-making.

Fairness in Multi-Task Learning via Wasserstein Barycenters

TL;DR

This work tackles fairness in a two-task learning setting with a shared representation under Demographic Parity. It reframes DP fairness as an optimal transport problem using Wasserstein-2 barycenters and derives a closed-form, post-processing fair predictor for both regression and binary classification tasks. A data-driven plug-in estimator, leveraging a labeled training set and an unlabeled pool within a You Only Train Once (YOTO) framework, enables practical deployment across arbitrary MT models. Empirical results on folktables and COMPAS datasets demonstrate substantial unfairness reduction with modest degradation in predictive performance, highlighting the method's scalability and applicability for fair decision-making in multi-task contexts.

Abstract

Algorithmic Fairness is an established field in machine learning that aims to reduce biases in data. Recent advances have proposed various methods to ensure fairness in a univariate environment, where the goal is to de-bias a single task. However, extending fairness to a multi-task setting, where more than one objective is optimised using a shared representation, remains underexplored. To bridge this gap, we develop a method that extends the definition of Strong Demographic Parity to multi-task learning using multi-marginal Wasserstein barycenters. Our approach provides a closed form solution for the optimal fair multi-task predictor including both regression and binary classification tasks. We develop a data-driven estimation procedure for the solution and run numerical experiments on both synthetic and real datasets. The empirical results highlight the practical value of our post-processing methodology in promoting fair decision-making.
Paper Structure (19 sections, 2 theorems, 20 equations, 3 figures, 2 tables, 2 algorithms)

This paper contains 19 sections, 2 theorems, 20 equations, 3 figures, 2 tables, 2 algorithms.

Key Result

theorem thmcountertheorem

Let Assumption assu:continuity be satisfied. Recall that $\pi_s = \mathbb{P}(S=s)$.

Figures (3)

  • Figure 1: Representation function sharing in a neural network for multi-task learning. The goal in DP-fairness is to construct a set of predictors $\{g^{\text{fair}}_t(\boldsymbol{X}, S)\}_t$ independent from the sensitive feature $S$. $\boldsymbol{X}^i$ refers to the $i$-th feature of $\boldsymbol{X}$.
  • Figure 2: Left, the performance as measured by MSE for MTL and STL, here the $\boldsymbol{\lambda}$ parameter was chosen to optimise the regression task. This leads to better outcomes, especially in the case of missing values in the regression labels. Right, regression estimates before versus after the optimal transport.
  • Figure 3: Joint distribution for scores under unconstrained and DP-fair regimes. Color indicates the presence of the sensitive feature. Note that the joint distribution appears more mixed and the marginal distributions overlap in the DP fair case.

Theorems & Definitions (8)

  • remark thmcounterremark: Misclassification risk and squared risk
  • definition thmcounterdefinition: Strong Demographic Parity
  • definition thmcounterdefinition: Unfairness
  • definition thmcounterdefinition: Wasserstein-2 distance
  • theorem thmcountertheorem: Optimal fair predictions
  • proof : sketch
  • corollary thmcountercorollary: Group-wise rank statistics
  • remark thmcounterremark: Data splitting