The Statistical Fairness-Accuracy Frontier
Alireza Fallah, Michael I. Jordan, Annie Ulichney
TL;DR
This work analyzes the fairness-accuracy frontier in a finite-sample regime for two-group regression, extending the population-frontier perspective to practical data-limited settings. It identifies minimax-optimal estimators under both known and unknown covariances, deriving explicit risk bounds and optimal sampling rules that reflect group heterogeneity. The results show that finite-sample effects shift the anticipated fairness-accuracy trade-offs, inducing asymmetric impacts across groups and guiding allocation of sampling resources. A uniform frontier bound provides high-probability guarantees for the entire FA frontier, enabling policymaking and deployment decisions based on empirical frontiers with quantified uncertainty.
Abstract
Machine learning models must balance accuracy and fairness, but these goals often conflict, particularly when data come from multiple demographic groups. A useful tool for understanding this trade-off is the fairness-accuracy (FA) frontier, which characterizes the set of models that cannot be simultaneously improved in both fairness and accuracy. Prior analyses of the FA frontier provide a full characterization under the assumption of complete knowledge of population distributions -- an unrealistic ideal. We study the FA frontier in the finite-sample regime, showing how it deviates from its population counterpart and quantifying the worst-case gap between them. In particular, we derive minimax-optimal estimators that depend on the designer's knowledge of the covariate distribution. For each estimator, we characterize how finite-sample effects asymmetrically impact each group's risk, and identify optimal sample allocation strategies. Our results transform the FA frontier from a theoretical construct into a practical tool for policymakers and practitioners who must often design algorithms with limited data.
