Fairness-Aware Meta-Learning via Nash Bargaining
Yi Zeng, Xuelin Yang, Li Chen, Cristian Canton Ferrer, Ming Jin, Michael I. Jordan, Ruoxi Jia
TL;DR
This work addresses group-level fairness in meta-learning by identifying hypergradient conflicts that destabilize one-stage fairness optimization. It introduces a two-stage framework, Nash-Meta-Learning, where an early Nash Bargaining Solution (NBS) aggregates hypergradients to steer updates toward the Pareto front, followed by stage-specific optimization for a chosen fairness objective. The authors provide an independence-free derivation of the NBS for gradient aggregation, prove Pareto improvement and monotonic validation-loss improvement, and demonstrate empirical gains across six fairness datasets and two image tasks, with improvements up to 10% in overall performance and up to 67% in disparity reduction. The approach offers a principled, game-theoretic mechanism to reconcile competing subgroup objectives in fairness-aware learning and shows robustness to varying fairness notions and data conditions, while highlighting the importance of validation-set quality.
Abstract
To address issues of group-level fairness in machine learning, it is natural to adjust model parameters based on specific fairness objectives over a sensitive-attributed validation set. Such an adjustment procedure can be cast within a meta-learning framework. However, naive integration of fairness goals via meta-learning can cause hypergradient conflicts for subgroups, resulting in unstable convergence and compromising model performance and fairness. To navigate this issue, we frame the resolution of hypergradient conflicts as a multi-player cooperative bargaining game. We introduce a two-stage meta-learning framework in which the first stage involves the use of a Nash Bargaining Solution (NBS) to resolve hypergradient conflicts and steer the model toward the Pareto front, and the second stage optimizes with respect to specific fairness goals. Our method is supported by theoretical results, notably a proof of the NBS for gradient aggregation free from linear independence assumptions, a proof of Pareto improvement, and a proof of monotonic improvement in validation loss. We also show empirical effects across various fairness objectives in six key fairness datasets and two image classification tasks.
