Table of Contents
Fetching ...

Byzantine-Robust Federated Learning: Impact of Client Subsampling and Local Updates

Youssef Allouah, Sadegh Farhadkhani, Rachid GuerraouI, Nirupam Gupta, Rafael Pinot, Geovani Rizk, Sasha Voitovych

TL;DR

This work tackles Byzantine-robust federated learning in the presence of client subsampling and multiple local updates. It introduces FedRo, a robust-aggregation-based variant of FedAvg, and provides a precise convergence theory that ties the sampling size $\hat n$, the Byzantine bound $\hat b$, and the number of local steps $K$ to the algorithm’s performance. The authors derive a sufficient condition on $\hat n$ and $\hat b$ to ensure convergence with high probability and show that the learning error comprises a vanishing optimization term and a persistent Byzantine term that can be reduced by increasing $K$, revealing diminishing returns beyond a threshold $\hat n_{opt}$. They further offer practical prescriptions for choosing $\hat n$ and $\hat b$ via sample-size thresholds $\hat n_{th}$ and $\hat n_{opt}$ and validate the theory with FEMNIST and CIFAR-10 experiments, demonstrating robustness improvements under Byzantine attacks while quantifying the trade-offs between communication and accuracy.

Abstract

The possibility of adversarial (a.k.a., {\em Byzantine}) clients makes federated learning (FL) prone to arbitrary manipulation. The natural approach to robustify FL against adversarial clients is to replace the simple averaging operation at the server in the standard $\mathsf{FedAvg}$ algorithm by a \emph{robust averaging rule}. While a significant amount of work has been devoted to studying the convergence of federated {\em robust averaging} (which we denote by $\mathsf{FedRo}$), prior work has largely ignored the impact of {\em client subsampling} and {\em local steps}, two fundamental FL characteristics. While client subsampling increases the effective fraction of Byzantine clients, local steps increase the drift between the local updates computed by honest (i.e., non-Byzantine) clients. Consequently, a careless deployment of $\mathsf{FedRo}$ could yield poor performance. We validate this observation by presenting an in-depth analysis of $\mathsf{FedRo}$ tightly analyzing the impact of client subsampling and local steps. Specifically, we present a sufficient condition on client subsampling for nearly-optimal convergence of $\mathsf{FedRo}$ (for smooth non-convex loss). Also, we show that the rate of improvement in learning accuracy {\em diminishes} with respect to the number of clients subsampled, as soon as the sample size exceeds a threshold value. Interestingly, we also observe that under a careful choice of step-sizes, the learning error due to Byzantine clients decreases with the number of local steps. We validate our theory by experiments on the FEMNIST and CIFAR-$10$ image classification tasks.

Byzantine-Robust Federated Learning: Impact of Client Subsampling and Local Updates

TL;DR

This work tackles Byzantine-robust federated learning in the presence of client subsampling and multiple local updates. It introduces FedRo, a robust-aggregation-based variant of FedAvg, and provides a precise convergence theory that ties the sampling size , the Byzantine bound , and the number of local steps to the algorithm’s performance. The authors derive a sufficient condition on and to ensure convergence with high probability and show that the learning error comprises a vanishing optimization term and a persistent Byzantine term that can be reduced by increasing , revealing diminishing returns beyond a threshold . They further offer practical prescriptions for choosing and via sample-size thresholds and and validate the theory with FEMNIST and CIFAR-10 experiments, demonstrating robustness improvements under Byzantine attacks while quantifying the trade-offs between communication and accuracy.

Abstract

The possibility of adversarial (a.k.a., {\em Byzantine}) clients makes federated learning (FL) prone to arbitrary manipulation. The natural approach to robustify FL against adversarial clients is to replace the simple averaging operation at the server in the standard algorithm by a \emph{robust averaging rule}. While a significant amount of work has been devoted to studying the convergence of federated {\em robust averaging} (which we denote by ), prior work has largely ignored the impact of {\em client subsampling} and {\em local steps}, two fundamental FL characteristics. While client subsampling increases the effective fraction of Byzantine clients, local steps increase the drift between the local updates computed by honest (i.e., non-Byzantine) clients. Consequently, a careless deployment of could yield poor performance. We validate this observation by presenting an in-depth analysis of tightly analyzing the impact of client subsampling and local steps. Specifically, we present a sufficient condition on client subsampling for nearly-optimal convergence of (for smooth non-convex loss). Also, we show that the rate of improvement in learning accuracy {\em diminishes} with respect to the number of clients subsampled, as soon as the sample size exceeds a threshold value. Interestingly, we also observe that under a careful choice of step-sizes, the learning error due to Byzantine clients decreases with the number of local steps. We validate our theory by experiments on the FEMNIST and CIFAR- image classification tasks.
Paper Structure (34 sections, 25 theorems, 143 equations, 4 figures, 5 tables, 1 algorithm)

This paper contains 34 sections, 25 theorems, 143 equations, 4 figures, 5 tables, 1 algorithm.

Key Result

Lemma 1

Let $p < 1$ and $b$ be such that $0 < b/n < 1/2$. Consider $\mathsf{FedRo}$ as defined in Algorithm algorithm:dsgd. Suppose that $\Hat{n}$ and ${\Hat{b}}^{}_{}$ are such that $b/n < {\Hat{b}}^{}_{}/\Hat{n} < 1/2$ and with $D_{ }\left(\alpha, \beta\right) := \alpha\ln\left(\alpha/\beta\right) + (1 - \alpha)\ln\left(1 - \alpha/1 - \beta\right)$, for $\alpha, \beta \in (0,1).$ Then, Event $\mathcal{

Figures (4)

  • Figure 1: Variation of $\Hat{n}_{th}$ and $\Hat{n}_{opt}$ with respect to the fraction of Byzantine clients.
  • Figure 2: Accuracy of $\mathsf{FedRo}$ with respect to the number of subsampled clients on the FEMNIST.
  • Figure 3: Accuracy of $\mathsf{FedRo}$ with NNM and Trimmed Mean on the FEMNIST dataset (left) and on CIFAR10 dataset (right).
  • Figure 4: Accuracy of $\mathsf{FedRo}$ with respect to the number of local steps on FEMNIST (left) and on CIFAR10 (right).

Theorems & Definitions (46)

  • Definition 1: $(n, b, \varepsilon)$-Byzantine resilience
  • Definition 2: $(\hat{n},\hat{b}, \kappa)$-robustness nnm
  • Lemma 1
  • Theorem 1
  • Corollary 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Remark 1
  • Lemma 4
  • ...and 36 more