Load Balancing in Federated Learning

Alireza Javani; Zhiying Wang

Load Balancing in Federated Learning

Alireza Javani, Zhiying Wang

TL;DR

This work tackles load balancing in Federated Learning under partial participation by introducing a load metric $X$ tied to Age of Information (AoI) and aiming to minimize $\operatorname{Var}[X]$ under $P(S_i^{(t)}=1)=\frac{k}{n}$. It advocates a decentralized Markov scheduling policy where each client's AoI governs its participation, and derives optimal transition probabilities, showing equivalence to oldest-age selection in many regimes. Through simulations on MNIST, CIFAR-10, and CIFAR-100 with $n=100$, $k=15$, the Markov policy achieves faster convergence than random selection (e.g., CIFAR-10: 240 vs 265 rounds; CIFAR-100: 500 vs 600+ rounds) and improves fairness by reducing update stale- ness. The results highlight the practical benefits of minimizing $\operatorname{Var}[X]$ for robustness to data heterogeneity and dynamic network conditions.

Abstract

Federated Learning (FL) is a decentralized machine learning framework that enables learning from data distributed across multiple remote devices, enhancing communication efficiency and data privacy. Due to limited communication resources, a scheduling policy is often applied to select a subset of devices for participation in each FL round. The scheduling process confronts significant challenges due to the need for fair workload distribution, efficient resource utilization, scalability in environments with numerous edge devices, and statistically heterogeneous data across devices. This paper proposes a load metric for scheduling policies based on the Age of Information and addresses the above challenges by minimizing the load metric variance across the clients. Furthermore, a decentralized Markov scheduling policy is presented, that ensures a balanced workload distribution while eliminating the management overhead irrespective of the network size due to independent client decision-making. We establish the optimal parameters of the Markov chain model and validate our approach through simulations. The results demonstrate that reducing the load metric variance not only promotes fairness and improves operational efficiency, but also enhances the convergence rate of the learning models.

Load Balancing in Federated Learning

TL;DR

This work tackles load balancing in Federated Learning under partial participation by introducing a load metric

tied to Age of Information (AoI) and aiming to minimize

under

. It advocates a decentralized Markov scheduling policy where each client's AoI governs its participation, and derives optimal transition probabilities, showing equivalence to oldest-age selection in many regimes. Through simulations on MNIST, CIFAR-10, and CIFAR-100 with

, the Markov policy achieves faster convergence than random selection (e.g., CIFAR-10: 240 vs 265 rounds; CIFAR-100: 500 vs 600+ rounds) and improves fairness by reducing update stale- ness. The results highlight the practical benefits of minimizing

for robustness to data heterogeneity and dynamic network conditions.

Abstract

Paper Structure (5 sections, 2 theorems, 26 equations, 4 figures)

This paper contains 5 sections, 2 theorems, 26 equations, 4 figures.

Introduction
Problem Setting
Client Selection
Simulation Results
Conclusion

Key Result

Theorem 1

If $m=1$, the variance of $X$ is given by $\operatorname{Var}[X] = \frac{(1 + p_0 - p_1)(1 - p_0)}{p_1^2}$. Considering this, the optimal values of $p_0, p_1$ depend on the relationship between $k$ and $\frac{n}{2}$. Specifically:

Figures (4)

Figure 1: Markov chain with $m+1$ states.
Figure 2: Comparison of accuracy between our proposed method (orange) and the random client selection method (blue) on the CIFAR-10 dataset with IID data distribution. The simulation parameters are $n=100$, $k=15$, and $m=10$.
Figure 3: Comparison of accuracy between our proposed method (orange) and the random client selection method (blue) on the CIFAR-100 dataset with IID data distribution. The simulation parameters are $n=100$, $k=15$, and $m=10$.
Figure 4: Comparison of accuracy between our proposed method (orange) and the random client selection method (blue) on the MNIST dataset with IID (top) and non-IID (bottom) data distribution. The simulation parameters are $n=100$, $k=15$, and $m=10$.

Theorems & Definitions (6)

Theorem 1
proof
Theorem 2
proof
Remark 1
Remark 2

Load Balancing in Federated Learning

TL;DR

Abstract

Load Balancing in Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (6)