Table of Contents
Fetching ...

When Are Learning Biases Equivalent? A Unifying Framework for Fairness, Robustness, and Distribution Shift

Sushant Mehta

TL;DR

The paper tackles the problem of disparate bias phenomena in ML by proposing a unifying information-theoretic framework, defining bias as $\mathcal{B}(f; \mathcal{D}) = I(\hat{Y}; A \mid Y)$, and proving formal equivalence conditions that connect spurious correlations, subpopulation shifts, and fairness violations. It provides a concrete corollary linking spurious correlation strength $\alpha$ to an equivalent imbalance ratio $r$ and demonstrates that, under feature overlap and smooth loss assumptions, worst-group accuracy differences between equivalent problems are bounded by $\delta(\epsilon, \eta) = O(\sqrt{\epsilon}/\eta)$. The authors validate the theory across six datasets and three architectures, showing that predicted equivalences hold within about 3% for worst-group accuracy and enabling effective transfer of debiasing methods with minimal retraining. This work offers a principled path to diagnose, compare, and transfer bias mitigation techniques across fairness, robustness, and distribution-shift domains, potentially accelerating practical improvements in real-world ML systems.

Abstract

Machine learning systems exhibit diverse failure modes: unfairness toward protected groups, brittleness to spurious correlations, poor performance on minority sub-populations, which are typically studied in isolation by distinct research communities. We propose a unifying theoretical framework that characterizes when different bias mechanisms produce quantitatively equivalent effects on model performance. By formalizing biases as violations of conditional independence through information-theoretic measures, we prove formal equivalence conditions relating spurious correlations, subpopulation shift, class imbalance, and fairness violations. Our theory predicts that a spurious correlation of strength $α$ produces equivalent worst-group accuracy degradation as a sub-population imbalance ratio $r \approx (1+α)/(1-α)$ under feature overlap assumptions. Empirical validation in six datasets and three architectures confirms that predicted equivalences hold within the accuracy of the worst group 3\%, enabling the principled transfer of debiasing methods across problem domains. This work bridges the literature on fairness, robustness, and distribution shifts under a common perspective.

When Are Learning Biases Equivalent? A Unifying Framework for Fairness, Robustness, and Distribution Shift

TL;DR

The paper tackles the problem of disparate bias phenomena in ML by proposing a unifying information-theoretic framework, defining bias as , and proving formal equivalence conditions that connect spurious correlations, subpopulation shifts, and fairness violations. It provides a concrete corollary linking spurious correlation strength to an equivalent imbalance ratio and demonstrates that, under feature overlap and smooth loss assumptions, worst-group accuracy differences between equivalent problems are bounded by . The authors validate the theory across six datasets and three architectures, showing that predicted equivalences hold within about 3% for worst-group accuracy and enabling effective transfer of debiasing methods with minimal retraining. This work offers a principled path to diagnose, compare, and transfer bias mitigation techniques across fairness, robustness, and distribution-shift domains, potentially accelerating practical improvements in real-world ML systems.

Abstract

Machine learning systems exhibit diverse failure modes: unfairness toward protected groups, brittleness to spurious correlations, poor performance on minority sub-populations, which are typically studied in isolation by distinct research communities. We propose a unifying theoretical framework that characterizes when different bias mechanisms produce quantitatively equivalent effects on model performance. By formalizing biases as violations of conditional independence through information-theoretic measures, we prove formal equivalence conditions relating spurious correlations, subpopulation shift, class imbalance, and fairness violations. Our theory predicts that a spurious correlation of strength produces equivalent worst-group accuracy degradation as a sub-population imbalance ratio under feature overlap assumptions. Empirical validation in six datasets and three architectures confirms that predicted equivalences hold within the accuracy of the worst group 3\%, enabling the principled transfer of debiasing methods across problem domains. This work bridges the literature on fairness, robustness, and distribution shifts under a common perspective.

Paper Structure

This paper contains 15 sections, 2 theorems, 8 equations, 5 tables.

Key Result

Theorem 2

Consider two learning problems $(\mathcal{D}_1, A_1)$ and $(\mathcal{D}_2, A_2)$ with the same feature space $\mathcal{X}$ and label space $Y$, but different attributes $A_1, A_2$. Under smoothness assumptions on the loss $\ell$ and feature overlap condition $\eta = \min_{y} \int \min(p_1(x|y), p_2( implies worst-group accuracy differs by at most $\delta(\epsilon, \eta)$ where $\delta(\epsilon, \e

Theorems & Definitions (3)

  • Definition 1: Bias
  • Theorem 2: Bias Equivalence
  • Corollary 3: Spurious Correlation $\leftrightarrow$ Imbalance