Table of Contents
Fetching ...

Generalization Error Matters in Decentralized Learning Under Byzantine Attacks

Haoxiang Ye, Qing Ling

TL;DR

This work addresses generalization in decentralized learning under Byzantine attacks by formulating a Byzantine-resilient DSGD framework and applying uniform stability to derive generalization bounds across strongly convex, convex, and non-convex losses. The analysis reveals that Byzantine agents introduce non-vanishing generalization error terms linked to the contraction constant $\\rho$, topology via $\\chi$, and the number of non-Byzantine agents, preventing zero generalization error even with infinite data. Numerical experiments on Erdos-Rényi networks validate the theory and show that robust aggregation rules like IOS can mitigate, but not eliminate, the generalization penalty from Byzantines. The results underscore the fundamental role of communication topology and robust aggregation in shaping generalization outcomes in decentralized learning with adversaries, and point to future work on time-uniform bounds and joint optimization-generalization strategies.

Abstract

Recently, decentralized learning has emerged as a popular peer-to-peer signal and information processing paradigm that enables model training across geographically distributed agents in a scalable manner, without the presence of any central server. When some of the agents are malicious (also termed as Byzantine), resilient decentralized learning algorithms are able to limit the impact of these Byzantine agents without knowing their number and identities, and have guaranteed optimization errors. However, analysis of the generalization errors, which are critical to implementations of the trained models, is still lacking. In this paper, we provide the first analysis of the generalization errors for a class of popular Byzantine-resilient decentralized stochastic gradient descent (DSGD) algorithms. Our theoretical results reveal that the generalization errors cannot be entirely eliminated because of the presence of the Byzantine agents, even if the number of training samples are infinitely large. Numerical experiments are conducted to confirm our theoretical results.

Generalization Error Matters in Decentralized Learning Under Byzantine Attacks

TL;DR

This work addresses generalization in decentralized learning under Byzantine attacks by formulating a Byzantine-resilient DSGD framework and applying uniform stability to derive generalization bounds across strongly convex, convex, and non-convex losses. The analysis reveals that Byzantine agents introduce non-vanishing generalization error terms linked to the contraction constant , topology via , and the number of non-Byzantine agents, preventing zero generalization error even with infinite data. Numerical experiments on Erdos-Rényi networks validate the theory and show that robust aggregation rules like IOS can mitigate, but not eliminate, the generalization penalty from Byzantines. The results underscore the fundamental role of communication topology and robust aggregation in shaping generalization outcomes in decentralized learning with adversaries, and point to future work on time-uniform bounds and joint optimization-generalization strategies.

Abstract

Recently, decentralized learning has emerged as a popular peer-to-peer signal and information processing paradigm that enables model training across geographically distributed agents in a scalable manner, without the presence of any central server. When some of the agents are malicious (also termed as Byzantine), resilient decentralized learning algorithms are able to limit the impact of these Byzantine agents without knowing their number and identities, and have guaranteed optimization errors. However, analysis of the generalization errors, which are critical to implementations of the trained models, is still lacking. In this paper, we provide the first analysis of the generalization errors for a class of popular Byzantine-resilient decentralized stochastic gradient descent (DSGD) algorithms. Our theoretical results reveal that the generalization errors cannot be entirely eliminated because of the presence of the Byzantine agents, even if the number of training samples are infinitely large. Numerical experiments are conducted to confirm our theoretical results.
Paper Structure (33 sections, 6 theorems, 98 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 33 sections, 6 theorems, 98 equations, 4 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

If a stochastic algorithm $\mathcal{L}$ is $\epsilon$-uniformly stable, then its generalization error satisfies $| {\mathbb{E}}_{\mathcal{S},\mathcal{L}}[F(\mathcal{L}(\mathcal{S}))-F_{\mathcal{S}}(\mathcal{L}(\mathcal{S}))] |\leq \epsilon$.

Figures (4)

  • Figure 1: Generalization error and testing accuracy of attack-free and Byzantine-resilient DSGD with strongly convex loss.
  • Figure 2: Generalization error and testing accuracy of attack-free and Byzantine-resilient DSGD with convex loss.
  • Figure 3: Generalization error and testing accuracy of attack-free and Byzantine-resilient DSGD with non-convex loss.
  • Figure 4: Generalization error and testing accuracy of Byzantine-resilient DSGD using IOS with convex loss and different $|{\mathcal{R}}|$.

Theorems & Definitions (17)

  • Definition 1: Virtual mixing matrix and contraction constant corresponding to $\{\mathcal{A}_n\}_{n \in {\mathcal{R}}}$
  • Definition 2: Uniform stability bousquet2002stability
  • Lemma 1: Generalization error via uniform stability hardt2016train
  • Theorem 1: Generalization error of Byzantine-resilient DSGD with strongly convex loss
  • Theorem 2: Generalization error of Byzantine-resilient DSGD with convex loss
  • Remark 1
  • Theorem 3: Generalization error of Byzantine-resilient DSGD with non-convex loss
  • Remark 2
  • Remark 3
  • Remark 4
  • ...and 7 more