Table of Contents
Fetching ...

PDSL: Privacy-Preserved Decentralized Stochastic Learning with Heterogeneous Data Distribution

Lina Wang, Yunsheng Yuan, Chunxiao Wang, Feng Li

TL;DR

PDSL addresses privacy-preserving decentralized learning with heterogeneous data by combining differential privacy with Shapley-value-based weighting of cross-gradient information. It perturbs both local and cross-gradients with Gaussian noise, computes Shapley-based contributions via Monte Carlo estimation, and uses these weights to form a momentum-like update across a gossip-structured network. The authors prove DP guarantees per round and establish convergence under standard smoothness and connectivity assumptions, showing a rate that scales as O(1/√T) with privacy-dependent terms. Empirically, PDSL outperforms DP-based baselines on MNIST and CIFAR-10 across fully connected, bipartite, and ring topologies, demonstrating robustness to increasing agent counts and stronger privacy requirements.

Abstract

In the paradigm of decentralized learning, a group of agents collaborates to learn a global model using distributed datasets without a central server. However, due to the heterogeneity of the local data across the different agents, learning a robust global model is rather challenging. Moreover, the collaboration of the agents relies on their gradient information exchange, which poses a risk of privacy leakage. In this paper, to address these issues, we propose PDSL, a novel privacy-preserved decentralized stochastic learning algorithm with heterogeneous data distribution. On one hand, we innovate in utilizing the notion of Shapley values such that each agent can precisely measure the contributions of its heterogeneous neighbors to the global learning goal; on the other hand, we leverage the notion of differential privacy to prevent each agent from suffering privacy leakage when it contributes gradient information to its neighbors. We conduct both solid theoretical analysis and extensive experiments to demonstrate the efficacy of our PDSL algorithm in terms of privacy preservation and convergence.

PDSL: Privacy-Preserved Decentralized Stochastic Learning with Heterogeneous Data Distribution

TL;DR

PDSL addresses privacy-preserving decentralized learning with heterogeneous data by combining differential privacy with Shapley-value-based weighting of cross-gradient information. It perturbs both local and cross-gradients with Gaussian noise, computes Shapley-based contributions via Monte Carlo estimation, and uses these weights to form a momentum-like update across a gossip-structured network. The authors prove DP guarantees per round and establish convergence under standard smoothness and connectivity assumptions, showing a rate that scales as O(1/√T) with privacy-dependent terms. Empirically, PDSL outperforms DP-based baselines on MNIST and CIFAR-10 across fully connected, bipartite, and ring topologies, demonstrating robustness to increasing agent counts and stronger privacy requirements.

Abstract

In the paradigm of decentralized learning, a group of agents collaborates to learn a global model using distributed datasets without a central server. However, due to the heterogeneity of the local data across the different agents, learning a robust global model is rather challenging. Moreover, the collaboration of the agents relies on their gradient information exchange, which poses a risk of privacy leakage. In this paper, to address these issues, we propose PDSL, a novel privacy-preserved decentralized stochastic learning algorithm with heterogeneous data distribution. On one hand, we innovate in utilizing the notion of Shapley values such that each agent can precisely measure the contributions of its heterogeneous neighbors to the global learning goal; on the other hand, we leverage the notion of differential privacy to prevent each agent from suffering privacy leakage when it contributes gradient information to its neighbors. We conduct both solid theoretical analysis and extensive experiments to demonstrate the efficacy of our PDSL algorithm in terms of privacy preservation and convergence.

Paper Structure

This paper contains 19 sections, 9 theorems, 102 equations, 6 figures, 2 tables, 2 algorithms.

Key Result

Theorem 1

For any $\epsilon > 0, \delta \in (0,1)$, Algorithmalg:pdsl can guarantee $(\epsilon,\delta)$-DP in each round, if where $C$ is the clipping threshold of any gradient, $\omega_{\min} = \min_{i \in \mathcal{M}, j \in \mathcal{M}_i} \omega_{i,j}$, and $\hat{\varphi}_{\min} = \min_{j \in \mathcal{M}_i, t\in T} \frac{\hat{\varphi}^{[t]}_{i,j}}{\sum_{k \in \mathcal{M}_i} \hat{\varphi}^{[t]}_{i,k}}$.

Figures (6)

  • Figure 1: Comparison results on MNIST dataset over fully connected graphs.
  • Figure 2: Comparison results on MNIST dataset over bipartite graphs.
  • Figure 3: Comparison results on MNIST dataset over ring graphs.
  • Figure 4: Comparison results on CIFAR-10 dataset over fully connected graphs.
  • Figure 5: Comparison results on CIFAR-10 dataset over bipartite graphs.
  • ...and 1 more figures

Theorems & Definitions (19)

  • Definition 1: $(\epsilon, \delta)$-Differential Privacy
  • Definition 2: Sensitivity
  • Definition 3: Cooperative Game
  • Definition 4: Shapley Value
  • Theorem 1
  • Theorem 2
  • Corollary 1
  • Lemma 1
  • proof
  • Lemma 2
  • ...and 9 more