Dynamic Byzantine-Robust Learning: Adapting to Switching Byzantine Workers

Ron Dorfman; Naseem Yehya; Kfir Y. Levy

Dynamic Byzantine-Robust Learning: Adapting to Switching Byzantine Workers

Ron Dorfman, Naseem Yehya, Kfir Y. Levy

TL;DR

This paper tackles the challenge of dynamic Byzantine faults in distributed SGD by introducing DynaBRO, a method that integrates multi-level Monte Carlo (MLMC) gradient estimation with a fail-safe filter and an adaptive learning rate. The approach achieves convergence nearly as good as the static setting when the number of identity-switch rounds is sublinear, specifically $\mathcal{O}(\sqrt{T})$, and remains robust to adversarial behavior that changes over time. To enhance adaptivity and robustness, it introduces a Median-Filtered Mean aggregator and AdaGrad-Norm learning, enabling performance without prior knowledge of the noise level or Byzantine fraction. Empirical results on MNIST and CIFAR-10 demonstrate strong resilience to dynamic attacks under various switching strategies, outperforming traditional momentum and SGD baselines in challenging dynamic regimes.

Abstract

Byzantine-robust learning has emerged as a prominent fault-tolerant distributed machine learning framework. However, most techniques focus on the static setting, wherein the identity of Byzantine workers remains unchanged throughout the learning process. This assumption fails to capture real-world dynamic Byzantine behaviors, which may include intermittent malfunctions or targeted, time-limited attacks. Addressing this limitation, we propose DynaBRO -- a new method capable of withstanding any sub-linear number of identity changes across rounds. Specifically, when the number of such changes is $\mathcal{O}(\sqrt{T})$ (where $T$ is the total number of training rounds), DynaBRO nearly matches the state-of-the-art asymptotic convergence rate of the static setting. Our method utilizes a multi-level Monte Carlo (MLMC) gradient estimation technique applied at the server to robustly aggregated worker updates. By additionally leveraging an adaptive learning rate, we circumvent the need for prior knowledge of the fraction of Byzantine workers.

Dynamic Byzantine-Robust Learning: Adapting to Switching Byzantine Workers

TL;DR

, and remains robust to adversarial behavior that changes over time. To enhance adaptivity and robustness, it introduces a Median-Filtered Mean aggregator and AdaGrad-Norm learning, enabling performance without prior knowledge of the noise level or Byzantine fraction. Empirical results on MNIST and CIFAR-10 demonstrate strong resilience to dynamic attacks under various switching strategies, outperforming traditional momentum and SGD baselines in challenging dynamic regimes.

Abstract

(where

is the total number of training rounds), DynaBRO nearly matches the state-of-the-art asymptotic convergence rate of the static setting. Our method utilizes a multi-level Monte Carlo (MLMC) gradient estimation technique applied at the server to robustly aggregated worker updates. By additionally leveraging an adaptive learning rate, we circumvent the need for prior knowledge of the fraction of Byzantine workers.

Paper Structure (65 sections, 43 theorems, 188 equations, 8 figures, 2 tables, 2 algorithms)

This paper contains 65 sections, 43 theorems, 188 equations, 8 figures, 2 tables, 2 algorithms.

Introduction
Preliminaries and Related Work
Problem Formulation and Assumptions
Related Work
The importance of history for Byzantine-robustness.
Byzantine-robustness and worker sampling.
MLMC estimation.
Warm-up: Static Robustness with MLMC
Motivation
MLMC Gradient Estimation
Byzantine-Robustness with MLMC Gradients
DynaBRO: Dynamic Byzantine-Robustness
MLMC fail-safe filter.
When worker-momentum fails.
Optimality and Adaptivity
...and 50 more sections

Key Result

Lemma 3.0

For $\mathcal{M}_f$ satisfying eq:lmgo, we have that

Figures (8)

Figure 1: Final test accuracy on MNIST under the Periodic($K$) identity-switching strategy for different values of $K$. Byzantine workers implement the SF attack and the server employs CWTM.
Figure 2: Test accuracy and histogram of the fraction of Byzantine workers over time on CIFAR-10 under the Bernoulli($p, D, \delta_{\max}$) identity-switching strategy for different values of $p$ and $D$. Byzantine workers employ the IPM attack and the server uses CWMed.
Figure 3: Optimality gap ($f(x_t)-f^*$) under static (top) and dynamic (bottom) attacks across various momentum parameters and for different attack strengths ($\lambda=0, 0.5, 1, 2, 5$). The average and 95% confidence interval are presented over $20$ random seeds.
Figure 4: Optimization trajectories under static (top) and dynamic (bottom) attacks for a range of momentum parameters and for different attack strengths ($\lambda=0, 0.5, 1, 2, 5$). Note that under the dynamic attack, the algorithm converges to a sub-optimal solution.
Figure 5: Test accuracy on MNIST under the Periodic($K$) identity-switching strategy for different values of $K$. Byzantine workers employ SF attack and the server implements CWTM aggregation.
...and 3 more figures

Theorems & Definitions (73)

Lemma 3.0
Definition 3.1: $(\delta, \kappa_{\delta})$-robustness
Lemma 3.1
Theorem 3.2
Theorem 4.1
Corollary 4.2
Lemma 5.1: Informal
Theorem 5.2
Lemma 1.1: Convex SGD
proof
...and 63 more

Dynamic Byzantine-Robust Learning: Adapting to Switching Byzantine Workers

TL;DR

Abstract

Dynamic Byzantine-Robust Learning: Adapting to Switching Byzantine Workers

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (73)