Gaming and Cooperation in Federated Learning: What Can Happen and How to Monitor It
Dongseok Kim, Hyoungsun Choi, Mohamed Jismy Aashik Rasool, Gisung Oh
TL;DR
The paper reframes Federated Learning as a governed strategic system where evaluation, information disclosure, rewards, and audits shape participant incentives. It introduces a three-layer framework and three key indices—Manipulability M, Price of Gaming PoG, and Price of Cooperation PoC—to quantify how policy choices affect metric gaming and welfare, and how participation dynamics respond over time. Through stylized simulations and a Fashion-MNIST case study, it shows that high-metric, low-welfare equilibria can emerge under narrow metric focus, and that mixed public-private evaluation, targeted audits, and calibrated sanctions can curb gaming while preserving cooperation. The work provides a practical design toolkit—penalty calibration, mixed challenges, audit budgeting, and governance checklists—to guide FL platforms toward stable, high-welfare cooperation with reduced metric gaming.
Abstract
The success of federated learning (FL) ultimately depends on how strategic participants behave under partial observability, yet most formulations still treat FL as a static optimization problem. We instead view FL deployments as governed strategic systems and develop an analytical framework that separates welfare-improving behavior from metric gaming. Within this framework, we introduce indices that quantify manipulability, the price of gaming, and the price of cooperation, and we use them to study how rules, information disclosure, evaluation metrics, and aggregator-switching policies reshape incentives and cooperation patterns. We derive threshold conditions for deterring harmful gaming while preserving benign cooperation, and for triggering auto-switch rules when early-warning indicators become critical. Building on these results, we construct a design toolkit including a governance checklist and a simple audit-budget allocation algorithm with a provable performance guarantee. Simulations across diverse stylized environments and a federated learning case study consistently match the qualitative and quantitative patterns predicted by our framework. Taken together, our results provide design principles and operational guidelines for reducing metric gaming while sustaining stable, high-welfare cooperation in FL platforms.
