Achieving Linear Speedup in Asynchronous Federated Learning with Heterogeneous Clients

Xiaolu Wang; Zijian Li; Shi Jin; Jun Zhang

Achieving Linear Speedup in Asynchronous Federated Learning with Heterogeneous Clients

Xiaolu Wang, Zijian Li, Shi Jin, Jun Zhang

TL;DR

This work tackles the straggler problem in federated learning with heterogeneous clients by introducing Delayed Federated Averaging (DeFedAvg), an asynchronous framework that allows local training to proceed on delayed global models using client receive buffers and server-side updates. It defines two variants, DeFedAvg-nIID for non-IID data with random client sampling and DeFedAvg-IID for IID data with first-arrived updates cached by the server, and proves nonconvex convergence with explicit linear speedup guarantees in the participating-client count $n$. Theoretical results show rates of $ ext{O}igl(1/igl( oot 2{nT}igr) + 1/(KT)igr)$ for DeFedAvg-nIID and $ ext{O}igl(1/ oot{nKT}{T} + 1/(KT)igr)$ for DeFedAvg-IID, highlighting improved scalability under asynchronous operation. Empirical evaluation on FashionMNIST and CIFAR-10 confirms faster wall-clock convergence and favorable scaling with $n$ and $K$, validating DeFedAvg as a practical and scalable AFL approach for heterogeneous federated systems.

Abstract

Federated learning (FL) is an emerging distributed training paradigm that aims to learn a common global model without exchanging or transferring the data that are stored locally at different clients. The Federated Averaging (FedAvg)-based algorithms have gained substantial popularity in FL to reduce the communication overhead, where each client conducts multiple localized iterations before communicating with a central server. In this paper, we focus on FL where the clients have diverse computation and/or communication capabilities. Under this circumstance, FedAvg can be less efficient since it requires all clients that participate in the global aggregation in a round to initiate iterations from the latest global model, and thus the synchronization among fast clients and straggler clients can severely slow down the overall training process. To address this issue, we propose an efficient asynchronous federated learning (AFL) framework called Delayed Federated Averaging (DeFedAvg). In DeFedAvg, the clients are allowed to perform local training with different stale global models at their own paces. Theoretical analyses demonstrate that DeFedAvg achieves asymptotic convergence rates that are on par with the results of FedAvg for solving nonconvex problems. More importantly, DeFedAvg is the first AFL algorithm that provably achieves the desirable linear speedup property, which indicates its high scalability. Additionally, we carry out extensive numerical experiments using real datasets to validate the efficiency and scalability of our approach when training deep neural networks.

Achieving Linear Speedup in Asynchronous Federated Learning with Heterogeneous Clients

TL;DR

. Theoretical results show rates of

for DeFedAvg-nIID and

for DeFedAvg-IID, highlighting improved scalability under asynchronous operation. Empirical evaluation on FashionMNIST and CIFAR-10 confirms faster wall-clock convergence and favorable scaling with

and

, validating DeFedAvg as a practical and scalable AFL approach for heterogeneous federated systems.

Abstract

Paper Structure (36 sections, 12 theorems, 83 equations, 7 figures, 2 tables, 2 algorithms)

This paper contains 36 sections, 12 theorems, 83 equations, 7 figures, 2 tables, 2 algorithms.

Introduction
Our Contributions
Related Works
Linear Speedup Analyses of FedAvg Algorithms
Asynchronous SGD Algorithms
AFL Algorithms
Delayed Federated Averaging
DeFedAvg-nIID
Clients' Procedures
Server's Procedures
DeFedAvg-IID
Clients' Procedures
Server's Procedures
Practical Issues of DeFedAvg.
Theoretical Analyses
...and 21 more sections

Key Result

Theorem 1

Let $F^* \in \mathbb{R}$ be the optimal function value of Problem eq:prob, $\bm{w}^0 \in \mathbb{R}^d$ be the initial global model of Algorithm algo2, and $\{\bm{w}^t\}_{t \geq 1}$ be the sequence of global models generated by Algorithm algo2. Let $\eta = \sqrt{4 n K (F(\bm{w}^0)- F^*)} ~\text{an Then, it holds for $T \geq 1$ that

Figures (7)

Figure 1: Comparison of clients' local training protocols of (synchronous) FedAvg, DeFedAvg-nIID, and DeFedAvg-IID.
Figure 2: Convergence over wall-clock time of DeFedAvg and other algorithms with $n=10$.
Figure 3: Convergence over wall-clock time of DeFedAvg and other algorithms with $n=20$.
Figure 4: Convergence over wall-clock time of DeFedAvg and other algorithms with $n=40$.
Figure 5: Convergence over wall-clock time of DeFedAvg and other algorithms with $n=80$.
...and 2 more figures

Theorems & Definitions (25)

Theorem 1
Theorem 2
proof
Lemma 1
proof
Lemma 2
proof
Lemma 3
proof
Lemma 4
...and 15 more

Achieving Linear Speedup in Asynchronous Federated Learning with Heterogeneous Clients

TL;DR

Abstract

Achieving Linear Speedup in Asynchronous Federated Learning with Heterogeneous Clients

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (25)