An Effective Theory of Bias Amplification
Arjun Subramonian, Samuel J. Bell, Levent Sagun, Elvis Dohmatob
TL;DR
The paper addresses bias amplification in machine learning by building a unifying theory for ridge regression in settings with and without random projections, modeling data as a two-group Gaussian mixture. It deploys operator-valued free probability theory to obtain deterministic equivalents for groupwise test-risk disparities (EDD, ODD) and the amplification metric ADD, across diverse parameterization, data-covariance structures, and group sizes. The authors derive a bias–variance decomposition R_s( f̂ ) ≈ B_s( f̂ ) + V_s( f̂ ), with V_s and B_s expressed via fixed-point scalars that capture inter-group covariance interactions, and they validate the theory through extensive synthetic and semi-synthetic experiments, including isotropic covariances and Colored MNIST. The work reveals phase transitions and regimes where regularization or early stopping can mitigate bias, and it provides actionable insights for evaluating and mitigating unfairness in ML, such as how overparameterization and feature composition affect minority-group performance.
Abstract
Machine learning models can capture and amplify biases present in data, leading to disparate test performance across social groups. To better understand, evaluate, and mitigate these biases, a deeper theoretical understanding of how model design choices and data distribution properties contribute to bias is needed. In this work, we contribute a precise analytical theory in the context of ridge regression, both with and without random projections, where the former models feedforward neural networks in a simplified regime. Our theory offers a unified and rigorous explanation of machine learning bias, providing insights into phenomena such as bias amplification and minority-group bias in various feature and parameter regimes. For example, we observe that there may be an optimal regularization penalty or training time to avoid bias amplification, and there can be differences in test error between groups that are not alleviated with increased parameterization. Importantly, our theoretical predictions align with empirical observations reported in the literature on machine learning bias. We extensively empirically validate our theory on synthetic and semi-synthetic datasets.
