Empowering Federated Learning with Implicit Gossiping: Mitigating Connection Unreliability Amidst Unknown and Arbitrary Dynamics

Ming Xiang; Stratis Ioannidis; Edmund Yeh; Carlee Joe-Wong; Lili Su

Empowering Federated Learning with Implicit Gossiping: Mitigating Connection Unreliability Amidst Unknown and Arbitrary Dynamics

Ming Xiang, Stratis Ioannidis, Edmund Yeh, Carlee Joe-Wong, Lili Su

TL;DR

The paper tackles federated learning under stochastic, time-varying, and unknown uplink failures modeled by $p_i^t$, where FedAvg exhibits bias when $p_i^t$ are nonuniform. It proposes Federated Postponed Broadcast (FedPBC), which delays global broadcasts to the round end, enabling implicit gossiping among active clients through a time-varying mixing matrix $W^{(t)}$, and proves convergence to a stationary point of the non-convex objective without requiring balanced participation. The analysis bounds the perturbation from nonuniform uplink availability via ergodicity-driven consensus, showing a $O(1/\sqrt{T})$ convergence rate under standard smoothness and gradient-noise assumptions, and the experiments on real-world datasets validate robustness across diverse unreliable uplink patterns. Overall, FedPBC offers a practical, theory-backed approach to mitigating communication unreliability in federated learning with unknown and arbitrary dynamics, yielding improved performance over FedAvg in many regimes.

Abstract

Federated learning is a popular distributed learning approach for training a machine learning model without disclosing raw data. It consists of a parameter server and a possibly large collection of clients (e.g., in cross-device federated learning) that may operate in congested and changing environments. In this paper, we study federated learning in the presence of stochastic and dynamic communication failures wherein the uplink between the parameter server and client $i$ is on with unknown probability $p_i^t$ in round $t$. Furthermore, we allow the dynamics of $p_i^t$ to be arbitrary. We first demonstrate that when the $p_i^t$'s vary across clients, the most widely adopted federated learning algorithm, Federated Average (FedAvg), experiences significant bias. To address this observation, we propose Federated Postponed Broadcast (FedPBC), a simple variant of FedAvg. FedPBC differs from FedAvg in that the parameter server postpones broadcasting the global model till the end of each round. Despite uplink failures, we show that FedPBC converges to a stationary point of the original non-convex objective. On the technical front, postponing the global model broadcasts enables implicit gossiping among the clients with active links in round $t$. Despite the time-varying nature of $p_i^t$, we can bound the perturbation of the global model dynamics using techniques to control gossip-type information mixing errors. Extensive experiments have been conducted on real-world datasets over diversified unreliable uplink patterns to corroborate our analysis.

Empowering Federated Learning with Implicit Gossiping: Mitigating Connection Unreliability Amidst Unknown and Arbitrary Dynamics

TL;DR

The paper tackles federated learning under stochastic, time-varying, and unknown uplink failures modeled by

, where FedAvg exhibits bias when

are nonuniform. It proposes Federated Postponed Broadcast (FedPBC), which delays global broadcasts to the round end, enabling implicit gossiping among active clients through a time-varying mixing matrix

, and proves convergence to a stationary point of the non-convex objective without requiring balanced participation. The analysis bounds the perturbation from nonuniform uplink availability via ergodicity-driven consensus, showing a

convergence rate under standard smoothness and gradient-noise assumptions, and the experiments on real-world datasets validate robustness across diverse unreliable uplink patterns. Overall, FedPBC offers a practical, theory-backed approach to mitigating communication unreliability in federated learning with unknown and arbitrary dynamics, yielding improved performance over FedAvg in many regimes.

Abstract

is on with unknown probability

in round

. Furthermore, we allow the dynamics of

to be arbitrary. We first demonstrate that when the

's vary across clients, the most widely adopted federated learning algorithm, Federated Average (FedAvg), experiences significant bias. To address this observation, we propose Federated Postponed Broadcast (FedPBC), a simple variant of FedAvg. FedPBC differs from FedAvg in that the parameter server postpones broadcasting the global model till the end of each round. Despite uplink failures, we show that FedPBC converges to a stationary point of the original non-convex objective. On the technical front, postponing the global model broadcasts enables implicit gossiping among the clients with active links in round

. Despite the time-varying nature of

, we can bound the perturbation of the global model dynamics using techniques to control gossip-type information mixing errors. Extensive experiments have been conducted on real-world datasets over diversified unreliable uplink patterns to corroborate our analysis.

Paper Structure (17 sections, 9 theorems, 70 equations, 9 figures, 2 tables, 1 algorithm)

This paper contains 17 sections, 9 theorems, 70 equations, 9 figures, 2 tables, 1 algorithm.

Introduction
Related Work
Client Unavailability
Bias Correction in Distributed Learning
Problem Formulation
A Case Study on the Bias of FedAvg
Algorithm: Federated Postponed Broadcast (FedPBC)
Convergence Analysis
Assumptions
Convergence Results
Numerical Experiments
Quadratic function
Real-world Datasets
Proofs of Selected Results
Conclusion
...and 2 more sections

Key Result

Proposition 1

Choose ${\bm x}^0 = \bm{0}$ and $\eta_t = \eta \in (0,1)$ for all $t$. For a global objective as per Eq. eq: counterexample global objective when $p_i^t=p_i$ for all $t$, it holds, under FedAvg with exact local gradients, that where ${\mathcal{B}}_j \triangleq \left\{ S |S\subseteq [m]\setminus\left\{ j \right\}, \left | S \right | = j-1 \right\}.$

Figures (9)

Figure 1: A federated learning system with moving autonomous vehicles as clients. The signal strength of the vehicles indicates the communication conditions.
Figure 2: A visualization of the expected output of FedAvg algorithm with two clients, whose $u_1 = 0, u_2 = 100$ and $p_1=0.5$. We vary $p_2\in [0,1]$ (shown as $x$-axis). Eq. \ref{['eq: bias optimum']} becomes $\lim_{T\to\infty} \mathbb{E}\left[ x^T \right] = \left( 150 \cdot p_2 \right)/\left( p_2 + 1 \right)$. $y$-axis is the expected output of FedAvg. When $p_2=0.5$, FedAvg recovers the global minimizer $(u_1+u_2)/2 = 50$. It can be seen that the expected output of the FedAvg algorithm can deviate far from the global minimizer when $p_1 \neq p_2$.
Figure 3: $\left\|{{\bm x}_{\text{PS}} - {\bm x}^\star} \right\|_2$ in logarithmic scale. The results are obtained after an average of $3$ random seeds. Plots are reported as mean $\pm$ standard deviation. The shaded areas plot standard deviation.
Figure 4: The construction of $p_i$'s.
Figure 5: Illustrations of the communication unreliable schemes evaluated in sec: real world numerical
...and 4 more figures

Theorems & Definitions (17)

Proposition 1
Proposition 2
Lemma 1: Lemma 1 in su2023federatedadv
Remark 1
Lemma 2: Descent Lemma
Lemma 3: Ergodicity
Lemma 4: Consensus Error
Theorem 1
Corollary 1
Remark 2
...and 7 more

Empowering Federated Learning with Implicit Gossiping: Mitigating Connection Unreliability Amidst Unknown and Arbitrary Dynamics

TL;DR

Abstract

Empowering Federated Learning with Implicit Gossiping: Mitigating Connection Unreliability Amidst Unknown and Arbitrary Dynamics

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (17)