Empowering Federated Learning with Implicit Gossiping: Mitigating Connection Unreliability Amidst Unknown and Arbitrary Dynamics
Ming Xiang, Stratis Ioannidis, Edmund Yeh, Carlee Joe-Wong, Lili Su
TL;DR
The paper tackles federated learning under stochastic, time-varying, and unknown uplink failures modeled by $p_i^t$, where FedAvg exhibits bias when $p_i^t$ are nonuniform. It proposes Federated Postponed Broadcast (FedPBC), which delays global broadcasts to the round end, enabling implicit gossiping among active clients through a time-varying mixing matrix $W^{(t)}$, and proves convergence to a stationary point of the non-convex objective without requiring balanced participation. The analysis bounds the perturbation from nonuniform uplink availability via ergodicity-driven consensus, showing a $O(1/\sqrt{T})$ convergence rate under standard smoothness and gradient-noise assumptions, and the experiments on real-world datasets validate robustness across diverse unreliable uplink patterns. Overall, FedPBC offers a practical, theory-backed approach to mitigating communication unreliability in federated learning with unknown and arbitrary dynamics, yielding improved performance over FedAvg in many regimes.
Abstract
Federated learning is a popular distributed learning approach for training a machine learning model without disclosing raw data. It consists of a parameter server and a possibly large collection of clients (e.g., in cross-device federated learning) that may operate in congested and changing environments. In this paper, we study federated learning in the presence of stochastic and dynamic communication failures wherein the uplink between the parameter server and client $i$ is on with unknown probability $p_i^t$ in round $t$. Furthermore, we allow the dynamics of $p_i^t$ to be arbitrary. We first demonstrate that when the $p_i^t$'s vary across clients, the most widely adopted federated learning algorithm, Federated Average (FedAvg), experiences significant bias. To address this observation, we propose Federated Postponed Broadcast (FedPBC), a simple variant of FedAvg. FedPBC differs from FedAvg in that the parameter server postpones broadcasting the global model till the end of each round. Despite uplink failures, we show that FedPBC converges to a stationary point of the original non-convex objective. On the technical front, postponing the global model broadcasts enables implicit gossiping among the clients with active links in round $t$. Despite the time-varying nature of $p_i^t$, we can bound the perturbation of the global model dynamics using techniques to control gossip-type information mixing errors. Extensive experiments have been conducted on real-world datasets over diversified unreliable uplink patterns to corroborate our analysis.
