Incentivizing Honesty among Competitors in Collaborative Learning and Optimization
Florian E. Dorner, Nikola Konstantinov, Georgi Pashaliev, Martin Vechev
TL;DR
The work addresses the vulnerability of collaborative learning to strategic, competitive participants who may dishonestly update central models. It introduces a game-theoretic framework that models attack/defense actions and player rewards for mean estimation and multi-round SGD, then designs incentive mechanisms—transferable-utility penalties and server-side noise—that guarantee honest equilibria and preserve convergence. Theoretical results show that with appropriately tuned penalties, rational players abstain from manipulation and learning remains close to the fully cooperative benchmark, while experiments on FeMNIST and Twitter validate robustness to non-convex objectives. The findings suggest that explicitly modeling incentives, rather than assuming malicious behavior, yields robust, participation-preserving collaborative learning suitable for competitive environments.
Abstract
Collaborative learning techniques have the potential to enable training machine learning models that are superior to models trained on a single entity's data. However, in many cases, potential participants in such collaborative schemes are competitors on a downstream task, such as firms that each aim to attract customers by providing the best recommendations. This can incentivize dishonest updates that damage other participants' models, potentially undermining the benefits of collaboration. In this work, we formulate a game that models such interactions and study two learning tasks within this framework: single-round mean estimation and multi-round SGD on strongly-convex objectives. For a natural class of player actions, we show that rational clients are incentivized to strongly manipulate their updates, preventing learning. We then propose mechanisms that incentivize honest communication and ensure learning quality comparable to full cooperation. Lastly, we empirically demonstrate the effectiveness of our incentive scheme on a standard non-convex federated learning benchmark. Our work shows that explicitly modeling the incentives and actions of dishonest clients, rather than assuming them malicious, can enable strong robustness guarantees for collaborative learning.
