Game of Coding: Sybil Resistant Decentralized Machine Learning with Minimal Trust Assumption
Hanzaleh Akbari Nodehi, Viveck R. Cadambe, Mohammad Ali Maddah-Ali
TL;DR
This work studies Sybil resilience in decentralized learning by formulating a Stackelberg game between a data collector and adversaries under repetition coding with N nodes. It introduces a reduction method using c^{eta}_{N,t}(alpha) and Algorithm 1 to compute the optimal acceptance threshold eta^*_{N,t}, establishing a two-problem equivalence that makes the analysis tractable. Theoretical results show that c^{eta}_{N,t}(alpha) = c^{eta}_{N-t+1,1}(alpha) for all alpha, implying the adversary's power collapses to the one-adversary scenario when at least one honest node is present, while revealing counterintuitive effects where more honest nodes do not always boost DC utility. The framework yields explicit forms for optimal noise distributions under strong symmetry (and provides Algorithm 2 to compute them), and demonstrates that liveness (system functionality) is enhanced at equilibrium compared to traditional trust-based thresholds. Together these results extend the game of coding to general N>=2 and offer practical tools for designing Sybil-resistant, incentive-aware DeML systems.
Abstract
Coding theory plays a crucial role in ensuring data integrity and reliability across various domains, from communication to computation and storage systems. However, its reliance on trust assumptions for data recovery, which requires the number of honest nodes to exceed adversarial nodes by a certain margin, poses significant challenges, particularly in emerging decentralized systems where trust is a scarce resource. To address this, the game of coding framework was introduced, offering insights into strategies for data recovery within incentive-oriented environments. In such environments, participant nodes are rewarded as long as the system remains functional (live). This incentivizes adversaries to maximize their rewards (utility) by ensuring that the decoder, as the data collector (DC), successfully recovers the data, preferably with a high estimation error. This rational behavior is leveraged in a game-theoretic framework, where the equilibrium leads to a robust and resilient system, referred to as the game of coding. The focus of the earliest version of the game of coding was limited to scenarios involving only two nodes. In this paper, we generalize the game of coding framework to scenarios with $N \ge 2$ nodes, exploring critical aspects of system behavior. Specifically, we (i) demonstrate that the adversary's utility at equilibrium is non-increasing with additional adversarial nodes, ensuring no gain for the adversary and no pain for the DC, thus establishing the game of coding framework's Sybil resistance; (ii) show that increasing the number of honest nodes does not always enhance the DC's utility, providing examples and proposing an algorithm to identify and mitigate this counterintuitive effect; and (iii) outline the optimal strategies for both the DC and the adversary, demonstrating that the system achieves enhanced liveness at equilibrium.
