f-FERM: A Scalable Framework for Robust Fair Empirical Risk Minimization
Sina Baharlouei, Shivam Patel, Meisam Razaviyayn
TL;DR
This work addresses the challenge of scalable, fair empirical risk minimization in settings with potential distribution shifts. It introduces f-FERM, a unified stochastic framework that regularizes the ERM objective with $f$-divergences, exploiting Legendre-Fenchel duality to obtain unbiased mini-batch gradients and a convergent SGDA algorithm with a provable iteration complexity. The authors extend f-FERM to distribution shifts via distributionally robust optimization, deriving two regimes: small shifts with $\\ell_p$ uncertainty sets and large shifts with $\\ell_{\\infty}$ uncertainty, each leading to tractable optimization procedures and memory-efficient implementations. Empirical results on standard fairness benchmarks (e.g., DP) demonstrate competitive fairness-accuracy tradeoffs across batch sizes, and robust performance under distribution shifts, highlighting practical utility for deploying fair ML systems without reliance on causal graphs.
Abstract
Training and deploying machine learning models that meet fairness criteria for protected groups are fundamental in modern artificial intelligence. While numerous constraints and regularization terms have been proposed in the literature to promote fairness in machine learning tasks, most of these methods are not amenable to stochastic optimization due to the complex and nonlinear structure of constraints and regularizers. Here, the term "stochastic" refers to the ability of the algorithm to work with small mini-batches of data. Motivated by the limitation of existing literature, this paper presents a unified stochastic optimization framework for fair empirical risk minimization based on f-divergence measures (f-FERM). The proposed stochastic algorithm enjoys theoretical convergence guarantees. In addition, our experiments demonstrate the superiority of fairness-accuracy tradeoffs offered by f-FERM for almost all batch sizes (ranging from full-batch to batch size of one). Moreover, we show that our framework can be extended to the case where there is a distribution shift from training to the test data. Our extension is based on a distributionally robust optimization reformulation of f-FERM objective under $L_p$ norms as uncertainty sets. Again, in this distributionally robust setting, f-FERM not only enjoys theoretical convergence guarantees but also outperforms other baselines in the literature in the tasks involving distribution shifts. An efficient stochastic implementation of $f$-FERM is publicly available.
