Byzantine-Robust Federated Learning: Impact of Client Subsampling and Local Updates

Youssef Allouah; Sadegh Farhadkhani; Rachid GuerraouI; Nirupam Gupta; Rafael Pinot; Geovani Rizk; Sasha Voitovych

Byzantine-Robust Federated Learning: Impact of Client Subsampling and Local Updates

Youssef Allouah, Sadegh Farhadkhani, Rachid GuerraouI, Nirupam Gupta, Rafael Pinot, Geovani Rizk, Sasha Voitovych

TL;DR

This work tackles Byzantine-robust federated learning in the presence of client subsampling and multiple local updates. It introduces FedRo, a robust-aggregation-based variant of FedAvg, and provides a precise convergence theory that ties the sampling size $\hat n$, the Byzantine bound $\hat b$, and the number of local steps $K$ to the algorithm’s performance. The authors derive a sufficient condition on $\hat n$ and $\hat b$ to ensure convergence with high probability and show that the learning error comprises a vanishing optimization term and a persistent Byzantine term that can be reduced by increasing $K$, revealing diminishing returns beyond a threshold $\hat n_{opt}$. They further offer practical prescriptions for choosing $\hat n$ and $\hat b$ via sample-size thresholds $\hat n_{th}$ and $\hat n_{opt}$ and validate the theory with FEMNIST and CIFAR-10 experiments, demonstrating robustness improvements under Byzantine attacks while quantifying the trade-offs between communication and accuracy.

Abstract

The possibility of adversarial (a.k.a., {\em Byzantine}) clients makes federated learning (FL) prone to arbitrary manipulation. The natural approach to robustify FL against adversarial clients is to replace the simple averaging operation at the server in the standard $\mathsf{FedAvg}$ algorithm by a \emph{robust averaging rule}. While a significant amount of work has been devoted to studying the convergence of federated {\em robust averaging} (which we denote by $\mathsf{FedRo}$), prior work has largely ignored the impact of {\em client subsampling} and {\em local steps}, two fundamental FL characteristics. While client subsampling increases the effective fraction of Byzantine clients, local steps increase the drift between the local updates computed by honest (i.e., non-Byzantine) clients. Consequently, a careless deployment of $\mathsf{FedRo}$ could yield poor performance. We validate this observation by presenting an in-depth analysis of $\mathsf{FedRo}$ tightly analyzing the impact of client subsampling and local steps. Specifically, we present a sufficient condition on client subsampling for nearly-optimal convergence of $\mathsf{FedRo}$ (for smooth non-convex loss). Also, we show that the rate of improvement in learning accuracy {\em diminishes} with respect to the number of clients subsampled, as soon as the sample size exceeds a threshold value. Interestingly, we also observe that under a careful choice of step-sizes, the learning error due to Byzantine clients decreases with the number of local steps. We validate our theory by experiments on the FEMNIST and CIFAR-$10$ image classification tasks.

Byzantine-Robust Federated Learning: Impact of Client Subsampling and Local Updates

TL;DR

, the Byzantine bound

, and the number of local steps

to the algorithm’s performance. The authors derive a sufficient condition on

and

to ensure convergence with high probability and show that the learning error comprises a vanishing optimization term and a persistent Byzantine term that can be reduced by increasing

, revealing diminishing returns beyond a threshold

. They further offer practical prescriptions for choosing

and

via sample-size thresholds

and

and validate the theory with FEMNIST and CIFAR-10 experiments, demonstrating robustness improvements under Byzantine attacks while quantifying the trade-offs between communication and accuracy.

Abstract

algorithm by a \emph{robust averaging rule}. While a significant amount of work has been devoted to studying the convergence of federated {\em robust averaging} (which we denote by

), prior work has largely ignored the impact of {\em client subsampling} and {\em local steps}, two fundamental FL characteristics. While client subsampling increases the effective fraction of Byzantine clients, local steps increase the drift between the local updates computed by honest (i.e., non-Byzantine) clients. Consequently, a careless deployment of

could yield poor performance. We validate this observation by presenting an in-depth analysis of

tightly analyzing the impact of client subsampling and local steps. Specifically, we present a sufficient condition on client subsampling for nearly-optimal convergence of

(for smooth non-convex loss). Also, we show that the rate of improvement in learning accuracy {\em diminishes} with respect to the number of clients subsampled, as soon as the sample size exceeds a threshold value. Interestingly, we also observe that under a careful choice of step-sizes, the learning error due to Byzantine clients decreases with the number of local steps. We validate our theory by experiments on the FEMNIST and CIFAR-

image classification tasks.

Paper Structure (34 sections, 25 theorems, 143 equations, 4 figures, 5 tables, 1 algorithm)

This paper contains 34 sections, 25 theorems, 143 equations, 4 figures, 5 tables, 1 algorithm.

Introduction
Problem Statement
Robust Federated Learning
Theoretical Analysis
Sufficient Condition on $\Hat{n}$ and ${\Hat{b}}^{}_{}$
Convergence of $\mathsf{FedRo}$
On the Choice of ${\Hat{b}}^{}_{}$ and $\Hat{n}$
Sample Size Threshold for Convergence
Sample Size Threshold for Order-Optimal Error
Empirical Results
Conclusion
Convergence Proof
Basic definitions and notations
Skeleton of the proof for Theorem \ref{['th:main']}
Combining all (proof of Theorem \ref{['th:main']})
...and 19 more sections

Key Result

Lemma 1

Let $p < 1$ and $b$ be such that $0 < b/n < 1/2$. Consider $\mathsf{FedRo}$ as defined in Algorithm algorithm:dsgd. Suppose that $\Hat{n}$ and ${\Hat{b}}^{}_{}$ are such that $b/n < {\Hat{b}}^{}_{}/\Hat{n} < 1/2$ and with $D_{ }\left(\alpha, \beta\right) := \alpha\ln\left(\alpha/\beta\right) + (1 - \alpha)\ln\left(1 - \alpha/1 - \beta\right)$, for $\alpha, \beta \in (0,1).$ Then, Event $\mathcal{

Figures (4)

Figure 1: Variation of $\Hat{n}_{th}$ and $\Hat{n}_{opt}$ with respect to the fraction of Byzantine clients.
Figure 2: Accuracy of $\mathsf{FedRo}$ with respect to the number of subsampled clients on the FEMNIST.
Figure 3: Accuracy of $\mathsf{FedRo}$ with NNM and Trimmed Mean on the FEMNIST dataset (left) and on CIFAR10 dataset (right).
Figure 4: Accuracy of $\mathsf{FedRo}$ with respect to the number of local steps on FEMNIST (left) and on CIFAR10 (right).

Theorems & Definitions (46)

Definition 1: $(n, b, \varepsilon)$-Byzantine resilience
Definition 2: $(\hat{n},\hat{b}, \kappa)$-robustness nnm
Lemma 1
Theorem 1
Corollary 1
Lemma 2
Lemma 3
Lemma 4
Remark 1
Lemma 4
...and 36 more

Byzantine-Robust Federated Learning: Impact of Client Subsampling and Local Updates

TL;DR

Abstract

Byzantine-Robust Federated Learning: Impact of Client Subsampling and Local Updates

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (46)