DynamicFL: Federated Learning with Dynamic Communication Resource Allocation

Qi Le; Enmao Diao; Xinran Wang; Vahid Tarokh; Jie Ding; Ali Anwar

DynamicFL: Federated Learning with Dynamic Communication Resource Allocation

Qi Le, Enmao Diao, Xinran Wang, Vahid Tarokh, Jie Ding, Ali Anwar

TL;DR

DynamicFL is introduced, a new FL framework that investigates the trade-offs between global model performance and communication costs for two widely adopted FL methods: Federated Stochastic Gradient Descent and Federated Averaging.

Abstract

Federated Learning (FL) is a collaborative machine learning framework that allows multiple users to train models utilizing their local data in a distributed manner. However, considerable statistical heterogeneity in local data across devices often leads to suboptimal model performance compared with independently and identically distributed (IID) data scenarios. In this paper, we introduce DynamicFL, a new FL framework that investigates the trade-offs between global model performance and communication costs for two widely adopted FL methods: Federated Stochastic Gradient Descent (FedSGD) and Federated Averaging (FedAvg). Our approach allocates diverse communication resources to clients based on their data statistical heterogeneity, considering communication resource constraints, and attains substantial performance enhancements compared to uniform communication resource allocation. Notably, our method bridges the gap between FedSGD and FedAvg, providing a flexible framework leveraging communication heterogeneity to address statistical heterogeneity in FL. Through extensive experiments, we demonstrate that DynamicFL surpasses current state-of-the-art methods with up to a 10% increase in model accuracy, demonstrating its adaptability and effectiveness in tackling data statistical heterogeneity challenges.

DynamicFL: Federated Learning with Dynamic Communication Resource Allocation

TL;DR

Abstract

Paper Structure (35 sections, 1 theorem, 19 equations, 33 figures, 22 tables, 1 algorithm)

This paper contains 35 sections, 1 theorem, 19 equations, 33 figures, 22 tables, 1 algorithm.

Introduction
Motivation
Limitation of state-of-art approaches
Key insights and contributions
Experimental methodology and artifact availability
Related works
Regularized FL
Compression techniques
Asynchronous FL
Design
Problem setting
Local data statistical heterogeneity.
Communication resource
Dynamic Federated Learning
Overall design
...and 20 more sections

Key Result

Theorem 1

Assume the objective as defined in (eq2). With the same learning rate $\eta \in (0,1)$ at each step for each client, the three-client DynamicFL and FedSGD algorithms will have the same rate of convergence at the same number of system-wide steps. More specifically, at $k \cdot r$ steps, where $k$ is where $\theta^*$ is the global minimum to (eq2) and $\theta_{0}$ is the initial model parameter.

Figures (33)

Figure 1: Experiment with ResNet-18 on CIFAR-10 in a statistically heterogeneous scenario: (a) each client has one class label data, (b) each client has two classes label data. A significant, unexplored performance-communication cost gap exists between FedSGD and FedAvg.
Figure 2: Overall Design: DynamicFL's operation over one global communication round. DynaComm identifies clients $2$ and $3$ for the high-frequency group based on their statistical heterogeneity and communication resource budgets, emulating FedSGD to guide optimizations towards global optima. The figure's right side shows the gradient update correction process for all four clients over four local steps, where $W_{1, 0}^t$ signifies client 1's model at round $t$ after $0$ local updates.
Figure 3: Learning curves for all communication interval combinations in Table \ref{['tab:freq_ablation_cifar10_main']}, $Dir(0.1)$ setting.
Figure 4: This figure illustrates the trend from Table \ref{['tab:freq_ablation_cifar10_main']}, comparing the high-frequency group only, extended intervals, and DynamicSGD, using a CNN with CIFAR-10. For the interval 'a', we depict a, a-b, a-c, a-d, a-e, a-g, and a*. The same pattern is replicated for intervals b, c, and d.
Figure 5: Comparison of KL-divergence and Runtime for CIFAR100 with $Dir(0.1)$.
...and 28 more figures

Theorems & Definitions (5)

Remark 1: Two distinct groups
Remark 2: Bridging DynamicSGD and FedAvg
Remark 3: Efficient DynaComm
Theorem 1
proof

DynamicFL: Federated Learning with Dynamic Communication Resource Allocation

TL;DR

Abstract

DynamicFL: Federated Learning with Dynamic Communication Resource Allocation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (33)

Theorems & Definitions (5)