How Robust is your Fair Model? Exploring the Robustness of Diverse Fairness Strategies

Edward Small; Wei Shao; Zeliang Zhang; Peihan Liu; Jeffrey Chan; Kacper Sokol; Flora Salim

How Robust is your Fair Model? Exploring the Robustness of Diverse Fairness Strategies

Edward Small, Wei Shao, Zeliang Zhang, Peihan Liu, Jeffrey Chan, Kacper Sokol, Flora Salim

TL;DR

This work introduces a robustness criterion for fairness in machine learning, proposing a robustness ratio $R_k$ to quantify how fairness metrics degrade under noise. It systematically compares four fairness strategies (pre-processing, in-processing, post-processing, and baseline) across five benchmark datasets and four group fairness definitions, revealing that threshold-based post-processing is consistently most sensitive to noise. The study combines theoretical results—bounding bias under equalised odds and convergence of distributional distance under noise—with extensive empirical analysis, showing that robustness is highly data- and metric-dependent and that post-processing often loses fairness as noise increases. The proposed framework and the associated FairR toolkit offer a practical way to evaluate and select fairness strategies that remain stable under data perturbations, informing deployment and data-quality considerations in real-world systems.

Abstract

With the introduction of machine learning in high-stakes decision making, ensuring algorithmic fairness has become an increasingly important problem to solve. In response to this, many mathematical definitions of fairness have been proposed, and a variety of optimisation techniques have been developed, all designed to maximise a defined notion of fairness. However, fair solutions are reliant on the quality of the training data, and can be highly sensitive to noise. Recent studies have shown that robustness (the ability for a model to perform well on unseen data) plays a significant role in the type of strategy that should be used when approaching a new problem and, hence, measuring the robustness of these strategies has become a fundamental problem. In this work, we therefore propose a new criterion to measure the robustness of various fairness optimisation strategies - the robustness ratio. We conduct multiple extensive experiments on five bench mark fairness data sets using three of the most popular fairness strategies with respect to four of the most popular definitions of fairness. Our experiments empirically show that fairness methods that rely on threshold optimisation are very sensitive to noise in all the evaluated data sets, despite mostly outperforming other methods. This is in contrast to the other two methods, which are less fair for low noise scenarios but fairer for high noise ones. To the best of our knowledge, we are the first to quantitatively evaluate the robustness of fairness optimisation strategies. This can potentially can serve as a guideline in choosing the most suitable fairness strategy for various data sets.

How Robust is your Fair Model? Exploring the Robustness of Diverse Fairness Strategies

TL;DR

This work introduces a robustness criterion for fairness in machine learning, proposing a robustness ratio

to quantify how fairness metrics degrade under noise. It systematically compares four fairness strategies (pre-processing, in-processing, post-processing, and baseline) across five benchmark datasets and four group fairness definitions, revealing that threshold-based post-processing is consistently most sensitive to noise. The study combines theoretical results—bounding bias under equalised odds and convergence of distributional distance under noise—with extensive empirical analysis, showing that robustness is highly data- and metric-dependent and that post-processing often loses fairness as noise increases. The proposed framework and the associated FairR toolkit offer a practical way to evaluate and select fairness strategies that remain stable under data perturbations, informing deployment and data-quality considerations in real-world systems.

Abstract

Paper Structure (52 sections, 5 theorems, 60 equations, 7 figures)

This paper contains 52 sections, 5 theorems, 60 equations, 7 figures.

Introduction
Related Work
Pre-processing
In-processing
Post-processing
Preliminaries
General Notation
Fairness Metrics
Existing Fairness Optimisation Strategies
Robustness of Fair Solutions
Noise
Continuous Data
Discrete Data
Expected Behaviour of a Model under Noisy Data
Robustness for Fairness
...and 37 more sections

Key Result

theorem 1

Take two distributions $p\sim N(\mu_p, \sigma^2_p)$ and $q\sim N(\mu_q, \sigma^2_q)$, which are parameterised by their mean $\mu$ and variance $\sigma$. Adding noise $\delta_k\sim N(0, k^2)$ to both distributions leads to (Proof in Appendix apx:convergence.)

Figures (7)

Figure 1: Example of the Laplace distribution for different values of $k$. Smaller values of $k$ centre the noise more strongly around $0$.
Figure 2: Pipeline example for measuring the robustness of a model $f$ for a single input $\boldsymbol{x}_1$ with $K=3$. If we only have one sample, i.e., $N=1$, so the first sum over the data set can be dropped from the robustness ratio. The input is perturbed slightly, and the function $M$ measures the fairness for both the original input and the perturbed inputs $\tilde{\boldsymbol{x}}_{1,i}$. The perturbed inputs are then compared to the original input. Protected features are labelled in bold.
Figure 3: Fairness and robustness of fairness for various strategies using the Adult Income data set with stochastic gradient descent.
Figure 4: Fairness and robustness of fairness for various strategies using the COMPAS data set with decision tree classification.
Figure 5: Fairness and robustness of fairness for various strategies using the Bank Marketing data set with naïve Bayes.
...and 2 more figures

Theorems & Definitions (28)

definition thmcounterdefinition: Demographic Parity
definition thmcounterdefinition: False Positive Rate
definition thmcounterdefinition: True Positive Rate
definition thmcounterdefinition: Equalised Odds
definition thmcounterdefinition: Fairness Set
definition thmcounterdefinition: Baseline $f_{base}$
definition thmcounterdefinition: Pre-processing $f_{pre}$
definition thmcounterdefinition: In-processing $f_{in}$
definition thmcounterdefinition: Post-processing $f_{post}$
definition thmcounterdefinition: Function Set
...and 18 more

How Robust is your Fair Model? Exploring the Robustness of Diverse Fairness Strategies

TL;DR

Abstract

How Robust is your Fair Model? Exploring the Robustness of Diverse Fairness Strategies

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (28)