Incentives in Federated Learning with Heterogeneous Agents
Ariel D. Procaccia, Han Shao, Itai Shapira
TL;DR
This work studies incentive design in federated learning with heterogeneous agents under a PAC accuracy objective, where an agent’s utility depends on who provides each sample. It shows that uncoordinated play can be highly inefficient and may lack pure equilibria, with the best equilibrium far from the social optimum. A central planner with full information can compute a near-optimal data-contribution plan via a polynomial-time LP relaxation that achieves a logarithmic approximation, and the authors design a Pay-What-You-Contribute mechanism that is strategyproof and unique among contribution-based transfers. The results offer a rigorous mechanism-design perspective for FL with heterogeneous data, bridging combinatorial optimization, learning theory, and public-good incentives, with implications for coordinating data-sharing across diverse domains.
Abstract
Federated learning promises significant sample-efficiency gains by pooling data across multiple agents, yet incentive misalignment is an obstacle: each update is costly to the contributor but boosts every participant. We introduce a game-theoretic framework that captures heterogeneous data: an agent's utility depends on who supplies each sample, not just how many. Agents aim to meet a PAC-style accuracy threshold at minimal personal cost. We show that uncoordinated play yields pathologies: pure equilibria may not exist, and the best equilibrium can be arbitrarily more costly than cooperation. To steer collaboration, we analyze the cost-minimizing contribution vector, prove that computing it is NP-hard, and derive a polynomial-time linear program that achieves a logarithmic approximation. Finally, pairing the LP with a simple pay what you contribute rule, where each agent receives a payment equal to its sample cost, yields a mechanism that is strategy-proof and, within the class of contribution-based transfers, is unique.
