FedCompass: Efficient Cross-Silo Federated Learning on Heterogeneous Client Devices using a Computing Power Aware Scheduler

Zilinghan Li; Pranshu Chaturvedi; Shilan He; Han Chen; Gagandeep Singh; Volodymyr Kindratenko; E. A. Huerta; Kibaek Kim; Ravi Madduri

FedCompass: Efficient Cross-Silo Federated Learning on Heterogeneous Client Devices using a Computing Power Aware Scheduler

Zilinghan Li, Pranshu Chaturvedi, Shilan He, Han Chen, Gagandeep Singh, Volodymyr Kindratenko, E. A. Huerta, Kibaek Kim, Ravi Madduri

TL;DR

FedCompass is proposed, an innovative semi-asynchronous federated learning algorithm with a computing power-aware scheduler on the server side, which adaptively assigns varying amounts of training tasks to different clients using the knowledge of the computing power of individual clients.

Abstract

Cross-silo federated learning offers a promising solution to collaboratively train robust and generalized AI models without compromising the privacy of local datasets, e.g., healthcare, financial, as well as scientific projects that lack a centralized data facility. Nonetheless, because of the disparity of computing resources among different clients (i.e., device heterogeneity), synchronous federated learning algorithms suffer from degraded efficiency when waiting for straggler clients. Similarly, asynchronous federated learning algorithms experience degradation in the convergence rate and final model accuracy on non-identically and independently distributed (non-IID) heterogeneous datasets due to stale local models and client drift. To address these limitations in cross-silo federated learning with heterogeneous clients and data, we propose FedCompass, an innovative semi-asynchronous federated learning algorithm with a computing power-aware scheduler on the server side, which adaptively assigns varying amounts of training tasks to different clients using the knowledge of the computing power of individual clients. FedCompass ensures that multiple locally trained models from clients are received almost simultaneously as a group for aggregation, effectively reducing the staleness of local models. At the same time, the overall training process remains asynchronous, eliminating prolonged waiting periods from straggler clients. Using diverse non-IID heterogeneous distributed datasets, we demonstrate that FedCompass achieves faster convergence and higher accuracy than other asynchronous algorithms while remaining more efficient than synchronous algorithms when performing federated learning on heterogeneous clients. The source code for FedCompass is available at https://github.com/APPFL/FedCompass.

FedCompass: Efficient Cross-Silo Federated Learning on Heterogeneous Client Devices using a Computing Power Aware Scheduler

TL;DR

Abstract

Paper Structure (34 sections, 4 theorems, 25 equations, 22 figures, 22 tables, 5 algorithms)

This paper contains 34 sections, 4 theorems, 25 equations, 22 figures, 22 tables, 5 algorithms.

Introduction
Related Work
Proposed Method: FedCompass
Compass: Computing Power-Aware Scheduler
FedCompass: Federated Learning with Compass
Convergence Analysis
Experiments
Experiment Setup
Experiment Results
Conclusion
Detailed Implementation of FedCompass
Upper Bound on the Number of Groups
FedCompass Convergence Analysis
List of Notations
Proof of Theorem \ref{['theorem-1']}
...and 19 more sections

Key Result

Theorem 1

Suppose that $\eta_\ell\leq\frac{1}{2LQ_{\max}}$, $\textit{Q}=Q_{\max}/Q_{\min}$, and $\mu'=\textit{Q}^{\lfloor\log_{\textit{Q}}\mu\rfloor}$. Then, after $T$ updates for global model $w$, FedCompass achieves the following convergence rate: where $m$ is the number of clients, $w^{(t)}$ is the global model after $t$ global updates, $\gamma_1=1+\frac{m-1}{\mu'}, \gamma_2=1+\mu'(m-1)$, $F^*=F(w^{(0)})

Figures (22)

Figure 1: Overview of an example FL run using Compass scheduler on five clients with the minimum number of local steps $Q_{\min}=20$ and maximum number of local steps $Q_{\max}=100$.
Figure 2: Change in validation accuracy and standard deviation for different FL algorithms on the dual Dirichlet partitioned MNIST dataset and the class partitioned CIFAR-10 dataset with five clients and three client heterogeneity settings. (Synchronous algorithms take the same amount of time, and asynchronous algorithms take the same amount of time in the same experiment setting.)
Figure 3: Sample class distribution generated by the class partition strategy among ten clients.
Figure 4: Sample class distribution generated by the dual Dirichlet partition strategy among ten clients.
Figure 5: Sample distribution of the training time to complete one local step among 50 clients under (a) normal distribution with $\sigma=0.3\mu$ and (b) exponential distribution.
...and 17 more figures

Theorems & Definitions (4)

Theorem 1
Corollary 1
Lemma 1
Lemma 2

FedCompass: Efficient Cross-Silo Federated Learning on Heterogeneous Client Devices using a Computing Power Aware Scheduler

TL;DR

Abstract

FedCompass: Efficient Cross-Silo Federated Learning on Heterogeneous Client Devices using a Computing Power Aware Scheduler

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (22)

Theorems & Definitions (4)