Byzantine-Robust Federated Learning: Impact of Client Subsampling and Local Updates
Youssef Allouah, Sadegh Farhadkhani, Rachid GuerraouI, Nirupam Gupta, Rafael Pinot, Geovani Rizk, Sasha Voitovych
TL;DR
This work tackles Byzantine-robust federated learning in the presence of client subsampling and multiple local updates. It introduces FedRo, a robust-aggregation-based variant of FedAvg, and provides a precise convergence theory that ties the sampling size $\hat n$, the Byzantine bound $\hat b$, and the number of local steps $K$ to the algorithm’s performance. The authors derive a sufficient condition on $\hat n$ and $\hat b$ to ensure convergence with high probability and show that the learning error comprises a vanishing optimization term and a persistent Byzantine term that can be reduced by increasing $K$, revealing diminishing returns beyond a threshold $\hat n_{opt}$. They further offer practical prescriptions for choosing $\hat n$ and $\hat b$ via sample-size thresholds $\hat n_{th}$ and $\hat n_{opt}$ and validate the theory with FEMNIST and CIFAR-10 experiments, demonstrating robustness improvements under Byzantine attacks while quantifying the trade-offs between communication and accuracy.
Abstract
The possibility of adversarial (a.k.a., {\em Byzantine}) clients makes federated learning (FL) prone to arbitrary manipulation. The natural approach to robustify FL against adversarial clients is to replace the simple averaging operation at the server in the standard $\mathsf{FedAvg}$ algorithm by a \emph{robust averaging rule}. While a significant amount of work has been devoted to studying the convergence of federated {\em robust averaging} (which we denote by $\mathsf{FedRo}$), prior work has largely ignored the impact of {\em client subsampling} and {\em local steps}, two fundamental FL characteristics. While client subsampling increases the effective fraction of Byzantine clients, local steps increase the drift between the local updates computed by honest (i.e., non-Byzantine) clients. Consequently, a careless deployment of $\mathsf{FedRo}$ could yield poor performance. We validate this observation by presenting an in-depth analysis of $\mathsf{FedRo}$ tightly analyzing the impact of client subsampling and local steps. Specifically, we present a sufficient condition on client subsampling for nearly-optimal convergence of $\mathsf{FedRo}$ (for smooth non-convex loss). Also, we show that the rate of improvement in learning accuracy {\em diminishes} with respect to the number of clients subsampled, as soon as the sample size exceeds a threshold value. Interestingly, we also observe that under a careful choice of step-sizes, the learning error due to Byzantine clients decreases with the number of local steps. We validate our theory by experiments on the FEMNIST and CIFAR-$10$ image classification tasks.
