10 Years of Fair Representations: Challenges and Opportunities
Mattia Cerrato, Marius Köppel, Philipp Wolf, Stefan Kramer
TL;DR
This paper revisits a decade of Fair Representation Learning (FRL), formalizing the goal to compress $X$ into a representation $Z$ that minimizes $I(Z; S)$ while preserving $I(Z; Y)$, and highlighting the fundamental trade-offs via $\min_{\theta} (1-\gamma) \mathcal{L}_{class}(\theta) + \gamma \mathcal{L}_{fair}(\theta)$ and the mutual-information formulation $\min_{Z} I(Z; S)$ with $I(Z; Y) \ge \alpha$. It introduces a theoretical impossibility for deterministic, infinite-precision FRL with injective activations, showing $I(S; Z^i)=I(S; X)$ across layers, and accompanies this with a massive empirical evaluation using EvalFRL on six datasets and eight FRL methods, aided by AutoML to probe residual $S$-dependence. The results indicate that many deterministic FRL methods fail to remove sensitive information from learned representations, while stochastic or quantized variants can achieve stronger invariance, aligning with the proposed impossibility. The work argues for rigorous, dual-frame evaluation, transparent reporting, and a shift toward stochastic/quantized approaches and physics-informed datasets to realize FRL’s real-world fairness potential.
Abstract
Fair Representation Learning (FRL) is a broad set of techniques, mostly based on neural networks, that seeks to learn new representations of data in which sensitive or undesired information has been removed. Methodologically, FRL was pioneered by Richard Zemel et al. about ten years ago. The basic concepts, objectives and evaluation strategies for FRL methodologies remain unchanged to this day. In this paper, we look back at the first ten years of FRL by i) revisiting its theoretical standing in light of recent work in deep learning theory that shows the hardness of removing information in neural network representations and ii) presenting the results of a massive experimentation (225.000 model fits and 110.000 AutoML fits) we conducted with the objective of improving on the common evaluation scenario for FRL. More specifically, we use automated machine learning (AutoML) to adversarially "mine" sensitive information from supposedly fair representations. Our theoretical and experimental analysis suggests that deterministic, unquantized FRL methodologies have serious issues in removing sensitive information, which is especially troubling as they might seem "fair" at first glance.
