Table of Contents
Fetching ...

Dynamic Byzantine-Robust Learning: Adapting to Switching Byzantine Workers

Ron Dorfman, Naseem Yehya, Kfir Y. Levy

TL;DR

This paper tackles the challenge of dynamic Byzantine faults in distributed SGD by introducing DynaBRO, a method that integrates multi-level Monte Carlo (MLMC) gradient estimation with a fail-safe filter and an adaptive learning rate. The approach achieves convergence nearly as good as the static setting when the number of identity-switch rounds is sublinear, specifically $\mathcal{O}(\sqrt{T})$, and remains robust to adversarial behavior that changes over time. To enhance adaptivity and robustness, it introduces a Median-Filtered Mean aggregator and AdaGrad-Norm learning, enabling performance without prior knowledge of the noise level or Byzantine fraction. Empirical results on MNIST and CIFAR-10 demonstrate strong resilience to dynamic attacks under various switching strategies, outperforming traditional momentum and SGD baselines in challenging dynamic regimes.

Abstract

Byzantine-robust learning has emerged as a prominent fault-tolerant distributed machine learning framework. However, most techniques focus on the static setting, wherein the identity of Byzantine workers remains unchanged throughout the learning process. This assumption fails to capture real-world dynamic Byzantine behaviors, which may include intermittent malfunctions or targeted, time-limited attacks. Addressing this limitation, we propose DynaBRO -- a new method capable of withstanding any sub-linear number of identity changes across rounds. Specifically, when the number of such changes is $\mathcal{O}(\sqrt{T})$ (where $T$ is the total number of training rounds), DynaBRO nearly matches the state-of-the-art asymptotic convergence rate of the static setting. Our method utilizes a multi-level Monte Carlo (MLMC) gradient estimation technique applied at the server to robustly aggregated worker updates. By additionally leveraging an adaptive learning rate, we circumvent the need for prior knowledge of the fraction of Byzantine workers.

Dynamic Byzantine-Robust Learning: Adapting to Switching Byzantine Workers

TL;DR

This paper tackles the challenge of dynamic Byzantine faults in distributed SGD by introducing DynaBRO, a method that integrates multi-level Monte Carlo (MLMC) gradient estimation with a fail-safe filter and an adaptive learning rate. The approach achieves convergence nearly as good as the static setting when the number of identity-switch rounds is sublinear, specifically , and remains robust to adversarial behavior that changes over time. To enhance adaptivity and robustness, it introduces a Median-Filtered Mean aggregator and AdaGrad-Norm learning, enabling performance without prior knowledge of the noise level or Byzantine fraction. Empirical results on MNIST and CIFAR-10 demonstrate strong resilience to dynamic attacks under various switching strategies, outperforming traditional momentum and SGD baselines in challenging dynamic regimes.

Abstract

Byzantine-robust learning has emerged as a prominent fault-tolerant distributed machine learning framework. However, most techniques focus on the static setting, wherein the identity of Byzantine workers remains unchanged throughout the learning process. This assumption fails to capture real-world dynamic Byzantine behaviors, which may include intermittent malfunctions or targeted, time-limited attacks. Addressing this limitation, we propose DynaBRO -- a new method capable of withstanding any sub-linear number of identity changes across rounds. Specifically, when the number of such changes is (where is the total number of training rounds), DynaBRO nearly matches the state-of-the-art asymptotic convergence rate of the static setting. Our method utilizes a multi-level Monte Carlo (MLMC) gradient estimation technique applied at the server to robustly aggregated worker updates. By additionally leveraging an adaptive learning rate, we circumvent the need for prior knowledge of the fraction of Byzantine workers.
Paper Structure (65 sections, 43 theorems, 188 equations, 8 figures, 2 tables, 2 algorithms)

This paper contains 65 sections, 43 theorems, 188 equations, 8 figures, 2 tables, 2 algorithms.

Key Result

Lemma 3.0

For $\mathcal{M}_f$ satisfying eq:lmgo, we have that

Figures (8)

  • Figure 1: Final test accuracy on MNIST under the Periodic($K$) identity-switching strategy for different values of $K$. Byzantine workers implement the SF attack and the server employs CWTM.
  • Figure 2: Test accuracy and histogram of the fraction of Byzantine workers over time on CIFAR-10 under the Bernoulli($p, D, \delta_{\max}$) identity-switching strategy for different values of $p$ and $D$. Byzantine workers employ the IPM attack and the server uses CWMed.
  • Figure 3: Optimality gap ($f(x_t)-f^*$) under static (top) and dynamic (bottom) attacks across various momentum parameters and for different attack strengths ($\lambda=0, 0.5, 1, 2, 5$). The average and 95% confidence interval are presented over $20$ random seeds.
  • Figure 4: Optimization trajectories under static (top) and dynamic (bottom) attacks for a range of momentum parameters and for different attack strengths ($\lambda=0, 0.5, 1, 2, 5$). Note that under the dynamic attack, the algorithm converges to a sub-optimal solution.
  • Figure 5: Test accuracy on MNIST under the Periodic($K$) identity-switching strategy for different values of $K$. Byzantine workers employ SF attack and the server implements CWTM aggregation.
  • ...and 3 more figures

Theorems & Definitions (73)

  • Lemma 3.0
  • Definition 3.1: $(\delta, \kappa_{\delta})$-robustness
  • Lemma 3.1
  • Theorem 3.2
  • Theorem 4.1
  • Corollary 4.2
  • Lemma 5.1: Informal
  • Theorem 5.2
  • Lemma 1.1: Convex SGD
  • proof
  • ...and 63 more