Federated Learning in the Presence of Adversarial Client Unavailability

Lili Su; Ming Xiang; Jiaming Xu; Pengkun Yang

Federated Learning in the Presence of Adversarial Client Unavailability

Lili Su, Ming Xiang, Jiaming Xu, Pengkun Yang

TL;DR

The paper addresses federated learning under adversarial, history-dependent client unavailability by introducing an $\epsilon$-adversary dropout model and analyzing simple FedAvg/FedProx variants under $(B,G)$-bounded dissimilarity. It establishes near-optimal estimation-error bounds for non-convex and strongly convex objectives, showing error scales as $\epsilon(G^2+\sigma^2)$ or $\epsilon(G^2+\sigma^2)/\mu^2$, respectively, and proves minimax lower bounds that match up to constants. It derives convergence rates of $O(1/\sqrt{T})$ for non-convex and $O(1/T)$ for strongly convex cases, demonstrating these are optimal for first-order methods with noisy gradients. Theoretical results are complemented by extensive experiments on CIFAR-10, Shakespeare NLP, and synthetic data, showing the proposed FedAvg/FedProx variants outperform baselines under adversarial dropout while remaining memory-efficient and scalable. Overall, the work provides a rigorous treatment of adversarial client unavailability and offers practical, provably robust FL algorithms for challenging real-world deployments.

Abstract

Federated learning is a decentralized machine learning framework that enables collaborative model training without revealing raw data. Due to the diverse hardware and software limitations, a client may not always be available for the computation requests from the parameter server. An emerging line of research is devoted to tackling arbitrary client unavailability. However, existing work still imposes structural assumptions on the unavailability patterns, impeding their applicability in challenging scenarios wherein the unavailability patterns are beyond the control of the parameter server. Moreover, in harsh environments like battlefields, adversaries can selectively and adaptively silence specific clients. In this paper, we relax the structural assumptions and consider adversarial client unavailability. To quantify the degrees of client unavailability, we use the notion of $ε$-adversary dropout fraction. We show that simple variants of FedAvg or FedProx, albeit completely agnostic to $ε$, converge to an estimation error on the order of $ε(G^2 + σ^2)$ for non-convex global objectives and $ε(G^2 + σ^2)/μ^2$ for $μ$ strongly convex global objectives, where $G$ is a heterogeneity parameter and $σ^2$ is the noise level. Conversely, we prove that any algorithm has to suffer an estimation error of at least $ε(G^2 + σ^2)/8$ and $ε(G^2 + σ^2)/(8μ^2)$ for non-convex global objectives and $μ$-strongly convex global objectives. Furthermore, the convergence speeds of the FedAvg or FedProx variants are $O(1/\sqrt{T})$ for non-convex objectives and $O(1/T)$ for strongly-convex objectives, both of which are the best possible for any first-order method that only has access to noisy gradients.

Federated Learning in the Presence of Adversarial Client Unavailability

TL;DR

The paper addresses federated learning under adversarial, history-dependent client unavailability by introducing an

-adversary dropout model and analyzing simple FedAvg/FedProx variants under

-bounded dissimilarity. It establishes near-optimal estimation-error bounds for non-convex and strongly convex objectives, showing error scales as

, respectively, and proves minimax lower bounds that match up to constants. It derives convergence rates of

for non-convex and

for strongly convex cases, demonstrating these are optimal for first-order methods with noisy gradients. Theoretical results are complemented by extensive experiments on CIFAR-10, Shakespeare NLP, and synthetic data, showing the proposed FedAvg/FedProx variants outperform baselines under adversarial dropout while remaining memory-efficient and scalable. Overall, the work provides a rigorous treatment of adversarial client unavailability and offers practical, provably robust FL algorithms for challenging real-world deployments.

Abstract

-adversary dropout fraction. We show that simple variants of FedAvg or FedProx, albeit completely agnostic to

, converge to an estimation error on the order of

for non-convex global objectives and

for

strongly convex global objectives, where

is a heterogeneity parameter and

is the noise level. Conversely, we prove that any algorithm has to suffer an estimation error of at least

and

for non-convex global objectives and

-strongly convex global objectives. Furthermore, the convergence speeds of the FedAvg or FedProx variants are

for non-convex objectives and

for strongly-convex objectives, both of which are the best possible for any first-order method that only has access to noisy gradients.

Paper Structure (37 sections, 8 theorems, 98 equations, 13 figures, 1 table)

This paper contains 37 sections, 8 theorems, 98 equations, 13 figures, 1 table.

Introduction
Related work
Partial client participation.
Byzantine-resilient distributed and federated learning.
System model
Algorithms and convergence guarantees
Non-convex functions
Strongly convex functions
Proofs of the main convergence guarantees
Key challenges
Objective inconsistency
Selection bias
Combining together
Minimax lower bounds
Proof sketch and comparison of lower bounds
...and 22 more sections

Key Result

Theorem 1.1

Let $\sigma$ be the average noise level of the stochastic gradients. For $\sqrt{\epsilon} B \le 0.1,$ where $\sup_{F_1, \cdots, F_M}$ is taken over all local objectives that collectively satisfy the $(B,G)$-heterogeneity condition, ${\mathcal{A}}$ is all adversarial client unavailability that is subject to $\epsilon$-adversary dropout fraction, and $\inf_{\widehat{\theta}}$ is taken over all algor

Figures (13)

Figure 1: Populations generated from Dirichlet distribution $\left( \alpha=0.1 \right)$ with different number of clients. Each row corresponds to the empirical distribution of local data in terms of classes. The colors correspond to data with different class labels.
Figure 2: CIFAR-10 results with Dirichlet parameter $\alpha=0.1$ and dropout fraction $\epsilon =0.8$ on adversarial client unavailability scheme in Section \ref{['subsec: adversarial client unavailability']}.
Figure 3: Natural language processing task with dropout fraction $\epsilon=0.7$ on a different adversarial client unavailability scheme, where the adversary inspects each client's local gradient improvement and removes clients of the greatest improvements subject to Assumption \ref{['ass:adversarial']}. Details can be found in Section \ref{['subsec: NLP']}.
Figure 4: Synthetic datasets: clients' local data volume histogram.
Figure 5: Synthetic datasets: comparisons with baselines with dropout fraction $\epsilon =0.9$ on adversarial client unavailability scheme in Section \ref{['subsec: adversarial client unavailability']}.
...and 8 more figures

Theorems & Definitions (16)

Theorem 1.1: Informal
Theorem 4.3
Corollary 4.4
Theorem 4.5
Remark 4.6: Convex objective functions
Lemma 5.1
Lemma 5.2
Theorem 6.1
Remark 6.2: The impact of dissimilarity parameter $B$
Remark 6.3: Convergence rate in $T$
...and 6 more

Federated Learning in the Presence of Adversarial Client Unavailability

TL;DR

Abstract

Federated Learning in the Presence of Adversarial Client Unavailability

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (16)