Rethinking Fair Representation Learning for Performance-Sensitive Tasks

Charles Jones; Fabio de Sousa Ribeiro; Mélanie Roschewitz; Daniel C. Castro; Ben Glocker

Rethinking Fair Representation Learning for Performance-Sensitive Tasks

Charles Jones, Fabio de Sousa Ribeiro, Mélanie Roschewitz, Daniel C. Castro, Ben Glocker

TL;DR

This work interrogates the validity of fair representation learning (FRL) for performance-sensitive tasks, using a causal framework to expose implicit assumptions and limitations when training and test data share the same distribution versus under distribution shift. It unifies the fairness literature into three paradigms—group parity, iid performance optimization, and unbiased distribution generalization—and develops a formal causal account of dataset bias with X decomposed into task-related X_Z and sensitive X_A features. The authors prove fundamental limitations of FRL in iid settings and propose two hypotheses for its potential validity under distribution shifts, supported by extensive experiments across medical imaging modalities that show FRL’s benefits are conditional on bias structure and subgroup separability. The results urge explicit bias analysis and careful, domain-aware evaluation practices for deploying fairness methods in real-world, high-stakes applications. The work provides practical guidance on when FRL may help and when it may harm, highlighting the central role of dataset bias structure and separability in determining a method’s usefulness.

Abstract

We investigate the prominent class of fair representation learning methods for bias mitigation. Using causal reasoning to define and formalise different sources of dataset bias, we reveal important implicit assumptions inherent to these methods. We prove fundamental limitations on fair representation learning when evaluation data is drawn from the same distribution as training data and run experiments across a range of medical modalities to examine the performance of fair representation learning under distribution shifts. Our results explain apparent contradictions in the existing literature and reveal how rarely considered causal and statistical aspects of the underlying data affect the validity of fair representation learning. We raise doubts about current evaluation practices and the applicability of fair representation learning methods in performance-sensitive settings. We argue that fine-grained analysis of dataset biases should play a key role in the field moving forward.

Rethinking Fair Representation Learning for Performance-Sensitive Tasks

TL;DR

Abstract

Paper Structure (27 sections, 9 theorems, 20 equations, 8 figures, 1 table)

This paper contains 27 sections, 9 theorems, 20 equations, 8 figures, 1 table.

Introduction
Three paradigms of group fairness analysis
Enforcing group parity
Maximising (subgroup-wise) iid performance
Generalising to unbiased distributions
Causal structures of dataset bias
Rethinking fair representations
Futility in the iid performance paradigm
Preliminaries
Potential validity in the distribution shift paradigm
Experiments and results
Verifying futility in the iid performance paradigm (Proposition 4.8)
Testing potential validity under causal shifts (Hypothesis 4.9)
Testing potential validity as a function of subgroup separability (Hypothesis 4.10)
Discussion
...and 12 more sections

Key Result

Lemma 4.0

Fair representations must depend on $X_Z$ only:

Figures (8)

Figure 1: Causal structures of dataset bias in classification tasks. The input ${\mathbf{X}}$ is decomposed into latent features ${X_Z, X_A}$ based on their causal relationships with the sensitive attribute $A$ and (unobserved) underlying class $Z$. In the unbiased setting (a), sensitive information is irrelevant to predicting the target $Y$. This condition may be violated by (b) feature entanglement of $A$ and $Z$, (c) differences in base rates across subgroups, or (d) differences in labelling policy across subgroups.
Figure 2: Percentage-point mean accuracy gap for frl models compared to erm models on iid disease classification tasks (train/test unbiased). Positive $\Delta$ Acc means frl outperforms erm. Datasets are sorted by increasing subgroup separability on the x-axis.
Figure 3: Percentage-point mean accuracy gap for frl models compared to erm models when trained on each mechanism of dataset bias (test set is always unbiased). Positive $\Delta$ Acc indicates that frl outperforms erm on the unbiased test set.
Figure 4: Percentage-point mean accuracy gap for frl models compared to erm models, aggregated over all bias mechanisms and plotted against subgroup separability AUC, as reported by jonesRoleSubgroupSeparability2023a. Positive $\Delta$ Acc indicates that frl outperforms erm on the unbiased test set. We use Kendall's $\tau$ statistic to test for a monotonic association between $\Delta$ Acc and subgroup separability. $y$-axis error bars represent standard deviations of the aggregated $\Delta$ Acc measurements. $x$-axis error bars represent standard deviations in subgroup separability measurements.
Figure 5: Two further examples of bias mechanisms for which \ref{['theorem:iidfutility']} applies to. Left is a causal structure (i.e. $\mathbf{X} \rightarrow Y$), where different groups with the same $X_Z$ features are annotated differently. Right includes an interaction feature $X_{A \land Z}$, acting as a collider for $A$ and $Z$. Any model that implicitly conditions on the $X_{A \land Z}$ feature will see a spurious correlation between $A$ and $Z$.
...and 3 more figures

Theorems & Definitions (17)

Definition 3.0: Unbiased distribution
Lemma 4.0
Lemma 4.0
Definition 4.0: Effectiveness
Definition 4.0: Harmlessness
Lemma 4.0
Lemma 4.0
Proposition 4.0: Futility
proof
Lemma A.0
...and 7 more

Rethinking Fair Representation Learning for Performance-Sensitive Tasks

TL;DR

Abstract

Rethinking Fair Representation Learning for Performance-Sensitive Tasks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (17)