Table of Contents
Fetching ...

Exposing the Illusion of Fairness: Auditing Vulnerabilities to Distributional Manipulation Attacks

Valentin Lafargue, Adriana Laurindo Monteiro, Emmanuelle Claeys, Laurent Risser, Jean-Michel Loubes

TL;DR

This work investigates to what extent a malicious auditee can construct a fairness-compliant yet representative-looking sample from a non-compliant original distribution, thereby creating an illusion of fairness.

Abstract

The rapid deployment of AI systems in high-stakes domains, including those classified as high-risk under the The EU AI Act (Regulation (EU) 2024/1689), has intensified the need for reliable compliance auditing. For binary classifiers, regulatory risk assessment often relies on global fairness metrics such as the Disparate Impact ratio, widely used to evaluate potential discrimination. In typical auditing settings, the auditee provides a subset of its dataset to an auditor, while a supervisory authority may verify whether this subset is representative of the full underlying distribution. In this work, we investigate to what extent a malicious auditee can construct a fairness-compliant yet representative-looking sample from a non-compliant original distribution, thereby creating an illusion of fairness. We formalize this problem as a constrained distributional projection task and introduce mathematically grounded manipulation strategies based on entropic and optimal transport projections. These constructions characterize the minimal distributional shift required to satisfy fairness constraints. To counter such attacks, we formalize representativeness through distributional distance based statistical tests and systematically evaluate their ability to detect manipulated samples. Our analysis highlights the conditions under which fairness manipulation can remain statistically undetected and provides practical guidelines for strengthening supervisory verification. We validate our theoretical findings through experiments on standard tabular datasets for bias detection. Code is publicly available at https://github.com/ValentinLafargue/Inspection.

Exposing the Illusion of Fairness: Auditing Vulnerabilities to Distributional Manipulation Attacks

TL;DR

This work investigates to what extent a malicious auditee can construct a fairness-compliant yet representative-looking sample from a non-compliant original distribution, thereby creating an illusion of fairness.

Abstract

The rapid deployment of AI systems in high-stakes domains, including those classified as high-risk under the The EU AI Act (Regulation (EU) 2024/1689), has intensified the need for reliable compliance auditing. For binary classifiers, regulatory risk assessment often relies on global fairness metrics such as the Disparate Impact ratio, widely used to evaluate potential discrimination. In typical auditing settings, the auditee provides a subset of its dataset to an auditor, while a supervisory authority may verify whether this subset is representative of the full underlying distribution. In this work, we investigate to what extent a malicious auditee can construct a fairness-compliant yet representative-looking sample from a non-compliant original distribution, thereby creating an illusion of fairness. We formalize this problem as a constrained distributional projection task and introduce mathematically grounded manipulation strategies based on entropic and optimal transport projections. These constructions characterize the minimal distributional shift required to satisfy fairness constraints. To counter such attacks, we formalize representativeness through distributional distance based statistical tests and systematically evaluate their ability to detect manipulated samples. Our analysis highlights the conditions under which fairness manipulation can remain statistically undetected and provides practical guidelines for strengthening supervisory verification. We validate our theoretical findings through experiments on standard tabular datasets for bias detection. Code is publicly available at https://github.com/ValentinLafargue/Inspection.

Paper Structure

This paper contains 52 sections, 4 theorems, 41 equations, 10 figures, 15 tables, 3 algorithms.

Key Result

theorem 1

Let $t \in \mathbb{R}^k$ and $\Phi : E \to \mathbb{R}^k$ be measurable. Assume that $t$ can be written as a convex combination of $\Phi(X_1,\hat{Y}_1,Y_1) , \ldots , \Phi(X_n,\hat{Y}_n,Y_n)$, with positive weights. Assume also that the empirical covariance matrix of $\Phi(X)$ is invertible. Let $\ma exists and is unique. It can also be computed as with, for $i\in\{1,\ldots,n\}$, $\lambda^{(t)}_i

Figures (10)

  • Figure 1: Admissible modifications of $\tau : {0,1}^2 \mapsto {0,1}^2$ that increase Disparate Impact using the Replace method.
  • Figure 2: Radar graph ranking $D_{\mathrm{KL}}$ ans $W$ similarity results depending on the fair-washing method (lower is better). The visualization highlights why $M_{W(X,S,\hat{Y})}$, considering $X$ or not on the similarity provides the most balanced overall performance.
  • Figure 3: Line plot illustrating the trade-off between fairness correction and distribution shift (on $X$ for the Wasserstein distance and $(X, S, \hat{Y})$ for the $D_{\mathrm{KL}}$) with the Adult dataset. KL divergence and Wasserstein distance, between the fully modified and the original datasets are reported for each method. Methods with infinite $D_{\mathrm{KL}}$ are omitted.
  • Figure 4: Distribution shift of Wasserstein distance, KL divergence and MMD, when constraining the Equality of Odds (EoO) fairness metric on the Adult dataset using the $M_{W(X,S,Y)}$ method.
  • Figure 5: Original (Old) and latest (New) results for their synthetic datasets, obtained over 30 runs for several values of $\alpha$, confirm a negligible impact on performance, with no observable difference between the two versions.
  • ...and 5 more figures

Theorems & Definitions (13)

  • remark 1
  • remark 2
  • theorem 1: Entropic Projection under constraint
  • proposition 1: KL-fair washing method
  • remark 3
  • theorem 2
  • remark 4
  • remark 5
  • proposition 2
  • proof
  • ...and 3 more