Table of Contents
Fetching ...

Game of Coding: Sybil Resistant Decentralized Machine Learning with Minimal Trust Assumption

Hanzaleh Akbari Nodehi, Viveck R. Cadambe, Mohammad Ali Maddah-Ali

TL;DR

This work studies Sybil resilience in decentralized learning by formulating a Stackelberg game between a data collector and adversaries under repetition coding with N nodes. It introduces a reduction method using c^{eta}_{N,t}(alpha) and Algorithm 1 to compute the optimal acceptance threshold eta^*_{N,t}, establishing a two-problem equivalence that makes the analysis tractable. Theoretical results show that c^{eta}_{N,t}(alpha) = c^{eta}_{N-t+1,1}(alpha) for all alpha, implying the adversary's power collapses to the one-adversary scenario when at least one honest node is present, while revealing counterintuitive effects where more honest nodes do not always boost DC utility. The framework yields explicit forms for optimal noise distributions under strong symmetry (and provides Algorithm 2 to compute them), and demonstrates that liveness (system functionality) is enhanced at equilibrium compared to traditional trust-based thresholds. Together these results extend the game of coding to general N>=2 and offer practical tools for designing Sybil-resistant, incentive-aware DeML systems.

Abstract

Coding theory plays a crucial role in ensuring data integrity and reliability across various domains, from communication to computation and storage systems. However, its reliance on trust assumptions for data recovery, which requires the number of honest nodes to exceed adversarial nodes by a certain margin, poses significant challenges, particularly in emerging decentralized systems where trust is a scarce resource. To address this, the game of coding framework was introduced, offering insights into strategies for data recovery within incentive-oriented environments. In such environments, participant nodes are rewarded as long as the system remains functional (live). This incentivizes adversaries to maximize their rewards (utility) by ensuring that the decoder, as the data collector (DC), successfully recovers the data, preferably with a high estimation error. This rational behavior is leveraged in a game-theoretic framework, where the equilibrium leads to a robust and resilient system, referred to as the game of coding. The focus of the earliest version of the game of coding was limited to scenarios involving only two nodes. In this paper, we generalize the game of coding framework to scenarios with $N \ge 2$ nodes, exploring critical aspects of system behavior. Specifically, we (i) demonstrate that the adversary's utility at equilibrium is non-increasing with additional adversarial nodes, ensuring no gain for the adversary and no pain for the DC, thus establishing the game of coding framework's Sybil resistance; (ii) show that increasing the number of honest nodes does not always enhance the DC's utility, providing examples and proposing an algorithm to identify and mitigate this counterintuitive effect; and (iii) outline the optimal strategies for both the DC and the adversary, demonstrating that the system achieves enhanced liveness at equilibrium.

Game of Coding: Sybil Resistant Decentralized Machine Learning with Minimal Trust Assumption

TL;DR

This work studies Sybil resilience in decentralized learning by formulating a Stackelberg game between a data collector and adversaries under repetition coding with N nodes. It introduces a reduction method using c^{eta}_{N,t}(alpha) and Algorithm 1 to compute the optimal acceptance threshold eta^*_{N,t}, establishing a two-problem equivalence that makes the analysis tractable. Theoretical results show that c^{eta}_{N,t}(alpha) = c^{eta}_{N-t+1,1}(alpha) for all alpha, implying the adversary's power collapses to the one-adversary scenario when at least one honest node is present, while revealing counterintuitive effects where more honest nodes do not always boost DC utility. The framework yields explicit forms for optimal noise distributions under strong symmetry (and provides Algorithm 2 to compute them), and demonstrates that liveness (system functionality) is enhanced at equilibrium compared to traditional trust-based thresholds. Together these results extend the game of coding to general N>=2 and offer practical tools for designing Sybil-resistant, incentive-aware DeML systems.

Abstract

Coding theory plays a crucial role in ensuring data integrity and reliability across various domains, from communication to computation and storage systems. However, its reliance on trust assumptions for data recovery, which requires the number of honest nodes to exceed adversarial nodes by a certain margin, poses significant challenges, particularly in emerging decentralized systems where trust is a scarce resource. To address this, the game of coding framework was introduced, offering insights into strategies for data recovery within incentive-oriented environments. In such environments, participant nodes are rewarded as long as the system remains functional (live). This incentivizes adversaries to maximize their rewards (utility) by ensuring that the decoder, as the data collector (DC), successfully recovers the data, preferably with a high estimation error. This rational behavior is leveraged in a game-theoretic framework, where the equilibrium leads to a robust and resilient system, referred to as the game of coding. The focus of the earliest version of the game of coding was limited to scenarios involving only two nodes. In this paper, we generalize the game of coding framework to scenarios with nodes, exploring critical aspects of system behavior. Specifically, we (i) demonstrate that the adversary's utility at equilibrium is non-increasing with additional adversarial nodes, ensuring no gain for the adversary and no pain for the DC, thus establishing the game of coding framework's Sybil resistance; (ii) show that increasing the number of honest nodes does not always enhance the DC's utility, providing examples and proposing an algorithm to identify and mitigate this counterintuitive effect; and (iii) outline the optimal strategies for both the DC and the adversary, demonstrating that the system achieves enhanced liveness at equilibrium.
Paper Structure (25 sections, 13 theorems, 139 equations, 4 figures, 2 algorithms)

This paper contains 25 sections, 13 theorems, 139 equations, 4 figures, 2 algorithms.

Key Result

Theorem 1

Let $\hat{\eta}_{N,t}$ be the output of Algorithm Alg:finding_eta. We have $\eta^*_{N,t} = \hat{\eta}_{N,t}$.

Figures (4)

  • Figure 1: This figure illustrates a system with $N=5$ nodes, where $4$ of them are adversarial, shown in red. Each node's task is to output $\mathbf{u}$, but this process is subject to noise. Honest nodes experience noise given by $\mathbf{n}_h$, while adversarial nodes have noise $[\mathbf{n}_a]_{a \in \mathcal{T}} \sim g(.)$, with $g(.)$ representing an arbitrary distribution independent of $\mathbf{u}$, and $\mathcal{T} = \{2,3,4,5\}$. Upon receiving the data, i.e., $\underline{\mathbf{y}} \triangleq (\mathbf{y}_1,\dots,\mathbf{y}_5)$, the DC checks whether $\max(\underline{\mathbf{y}}) - \min(\underline{\mathbf{y}}) \leq \eta \Delta$. If this condition is not met, the DC rejects the input; otherwise, it accepts the input and outputs $\frac{\max(\underline{\mathbf{y}}) + \min(\underline{\mathbf{y}})}{2}$ as its estimate.
  • Figure 2: The curves of $c^{\eta}_{20,19}(\cdot)$ for $\eta \in \{2, 2.25, 2.5, \dots, 8\}$. The lower-blue curve represents $c^{2}_{20,19}(\cdot)$, and the upper-red curve represents $c^{8}_{20,19}(\cdot)$. The green and red circles correspond to $\mathcal{L}^{\eta}_{20,19}$ for Case 1 and Case 2, respectively. The black and yellow circles indicate the equilibria for Case 1 and Case 2.
  • Figure 3: Utility of the DC for different numbers of honest nodes in Example \ref{['first_example_equilibrium']}. Here, the DC commits to $\eta^*_{20,19}$, computed under the worst-case assumption of one honest node and $t=19$ adversarial nodes. In practice, however, there may be more honest nodes (and hence fewer adversaries), even though the DC is unaware of the realized composition. Each curve therefore starts at the worst-case point (one honest node and $t=19$). The blue curve corresponds to Case 1, and the red curve corresponds to Case 2. The important observation is that the DC’s utility does not necessarily increase when the number of honest nodes exceeds what the DC perceives. In Case 1 (blue curve), the utility functions are proper, whereas in Case 2 (red curve), they are not.
  • Figure 4: In the first scenario, there are $t$ adversarial nodes and $N-t$ honest nodes. The noise distribution of the adversarial nodes is $g^*_t(.)$, which is a solution to \ref{['C_definition']} and satisfies Lemma \ref{['lemma:there_is_non_cancelling_noise']}. Let $\mathbf{n}_{\text{abs}} \triangleq \max |\mathbf{n}_a|$, for $a \in \{ 1,2,\dots,t\}$. In the second scenario there are again $t$ adversarial nodes and $N-t$ honest nodes. The noise of the adversarial nodes is $\mathbf{n}_{\text{abs}}$. In the third scenario, there are one adversarial node and $N-t$ honest nodes. The noise of the adversarial node is $\mathbf{n}_{\text{abs}}$.

Theorems & Definitions (34)

  • Definition 1
  • Definition 2
  • Definition 3
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Remark 1
  • Remark 2
  • Theorem 4
  • Remark 3
  • ...and 24 more