Table of Contents
Fetching ...

ROSS: RObust decentralized Stochastic learning based on Shapley values

Lina Wang, Yunsheng Yuan, Feng Li, Lingjie Duan

TL;DR

This work tackles robust decentralized stochastic learning under heterogeneous and adversarial data. It introduces ROSS, a serverless algorithm that weights cross-gradient information using Shapley values and updates models with momentum, achieving robustness to non-IID data and poisoning. The authors prove a convergence rate of $\mathcal{O}\left(\frac{1}{\sqrt{NT}}\right)$, indicating a linear speedup in the number of agents, and validate the approach with extensive experiments on MNIST and CIFAR-10 across various topologies and threat models. The results demonstrate that Shapley-valued aggregation of cross-gradients yields superior convergence and prediction accuracy in realistic decentralized settings, suggesting practical impact for scalable, robust distributed learning.

Abstract

In the paradigm of decentralized learning, a group of agents collaborate to learn a global model using a distributed dataset without a central server; nevertheless, it is severely challenged by the heterogeneity of the data distribution across the agents. For example, the data may be distributed non-independently and identically, and even be noised or poisoned. To address these data challenges, we propose ROSS, a novel robust decentralized stochastic learning algorithm based on Shapley values, in this paper. Specifically, in each round, each agent aggregates the cross-gradient information from its neighbors, i.e., the derivatives of its local model with respect to the datasets of its neighbors, to update its local model in a momentum like manner, while we innovate in weighting the derivatives according to their contributions measured by Shapley values. We perform solid theoretical analysis to reveal the linear convergence speedup of our ROSS algorithm. We also verify the efficacy of our algorithm through extensive experiments on public datasets. Our results demonstrate that, in face of the above variety of data challenges, our ROSS algorithm has significant advantages over existing state-of-the-art proposals in terms of both convergence and prediction accuracy.

ROSS: RObust decentralized Stochastic learning based on Shapley values

TL;DR

This work tackles robust decentralized stochastic learning under heterogeneous and adversarial data. It introduces ROSS, a serverless algorithm that weights cross-gradient information using Shapley values and updates models with momentum, achieving robustness to non-IID data and poisoning. The authors prove a convergence rate of , indicating a linear speedup in the number of agents, and validate the approach with extensive experiments on MNIST and CIFAR-10 across various topologies and threat models. The results demonstrate that Shapley-valued aggregation of cross-gradients yields superior convergence and prediction accuracy in realistic decentralized settings, suggesting practical impact for scalable, robust distributed learning.

Abstract

In the paradigm of decentralized learning, a group of agents collaborate to learn a global model using a distributed dataset without a central server; nevertheless, it is severely challenged by the heterogeneity of the data distribution across the agents. For example, the data may be distributed non-independently and identically, and even be noised or poisoned. To address these data challenges, we propose ROSS, a novel robust decentralized stochastic learning algorithm based on Shapley values, in this paper. Specifically, in each round, each agent aggregates the cross-gradient information from its neighbors, i.e., the derivatives of its local model with respect to the datasets of its neighbors, to update its local model in a momentum like manner, while we innovate in weighting the derivatives according to their contributions measured by Shapley values. We perform solid theoretical analysis to reveal the linear convergence speedup of our ROSS algorithm. We also verify the efficacy of our algorithm through extensive experiments on public datasets. Our results demonstrate that, in face of the above variety of data challenges, our ROSS algorithm has significant advantages over existing state-of-the-art proposals in terms of both convergence and prediction accuracy.

Paper Structure

This paper contains 25 sections, 8 theorems, 87 equations, 12 figures, 2 algorithms.

Key Result

Theorem 1

Suppose Assumptionsass:L-smooth-ass:ds_mat hold, and learning rate $\gamma$ satisfies the following condition For any $T \geq 1$, we have for Algorithmalg:ross, where $\mathcal{F}^*$ denotes the optimal value of the objective function, and

Figures (12)

  • Figure 1: Comparison results on MNIST dataset over fully connected graphs.
  • Figure 2: Comparison results on MNIST dataset over bipartite graphs.
  • Figure 3: Comparison results in terms of test accuracy on MNIST dataset over fully connected graphs.
  • Figure 4: Comparison results in terms of test accuracy on MNIST dataset over bipartite graphs.
  • Figure 5: Comparison results on CIFAR-10 dataset over fully connected graphs.
  • ...and 7 more figures

Theorems & Definitions (10)

  • Definition 1: Cooperative Game
  • Definition 2: Shapley Value
  • Theorem 1
  • Corollary 1
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • Lemma 6