Table of Contents
Fetching ...

Adaptive Heterogeneous Client Sampling for Federated Learning over Wireless Networks

Bing Luo, Wenli Xiao, Shiqiang Wang, Jianwei Huang, Leandros Tassiulas

TL;DR

This work tackles wall-clock time minimization for federated learning over bandwidth-limited wireless networks by jointly optimizing adaptive, heterogeneous client sampling and bandwidth allocation. It derives a tractable convergence bound for arbitrary sampling, formulates an approximate non-convex objective, and develops a practical two-stage method to learn unknown parameters and compute an effective sampling distribution. The proposed scheme demonstrates significant reductions in convergence time compared with baselines on hardware prototypes and simulations, including non-convex loss scenarios. The results reveal how system heterogeneity (computation and communication times) and statistical heterogeneity (data quality/quantity) interact to shape optimal client sampling, and show a clear trade-off in the number of sampled clients per round.

Abstract

Federated learning (FL) algorithms usually sample a fraction of clients in each round (partial participation) when the number of participants is large and the server's communication bandwidth is limited. Recent works on the convergence analysis of FL have focused on unbiased client sampling, e.g., sampling uniformly at random, which suffers from slow wall-clock time for convergence due to high degrees of system heterogeneity and statistical heterogeneity. This paper aims to design an adaptive client sampling algorithm for FL over wireless networks that tackles both system and statistical heterogeneity to minimize the wall-clock convergence time. We obtain a new tractable convergence bound for FL algorithms with arbitrary client sampling probability. Based on the bound, we analytically establish the relationship between the total learning time and sampling probability with an adaptive bandwidth allocation scheme, which results in a non-convex optimization problem. We design an efficient algorithm for learning the unknown parameters in the convergence bound and develop a low-complexity algorithm to approximately solve the non-convex problem. Our solution reveals the impact of system and statistical heterogeneity parameters on the optimal client sampling design. Moreover, our solution shows that as the number of sampled clients increases, the total convergence time first decreases and then increases because a larger sampling number reduces the number of rounds for convergence but results in a longer expected time per-round due to limited wireless bandwidth. Experimental results from both hardware prototype and simulation demonstrate that our proposed sampling scheme significantly reduces the convergence time compared to several baseline sampling schemes.

Adaptive Heterogeneous Client Sampling for Federated Learning over Wireless Networks

TL;DR

This work tackles wall-clock time minimization for federated learning over bandwidth-limited wireless networks by jointly optimizing adaptive, heterogeneous client sampling and bandwidth allocation. It derives a tractable convergence bound for arbitrary sampling, formulates an approximate non-convex objective, and develops a practical two-stage method to learn unknown parameters and compute an effective sampling distribution. The proposed scheme demonstrates significant reductions in convergence time compared with baselines on hardware prototypes and simulations, including non-convex loss scenarios. The results reveal how system heterogeneity (computation and communication times) and statistical heterogeneity (data quality/quantity) interact to shape optimal client sampling, and show a clear trade-off in the number of sampled clients per round.

Abstract

Federated learning (FL) algorithms usually sample a fraction of clients in each round (partial participation) when the number of participants is large and the server's communication bandwidth is limited. Recent works on the convergence analysis of FL have focused on unbiased client sampling, e.g., sampling uniformly at random, which suffers from slow wall-clock time for convergence due to high degrees of system heterogeneity and statistical heterogeneity. This paper aims to design an adaptive client sampling algorithm for FL over wireless networks that tackles both system and statistical heterogeneity to minimize the wall-clock convergence time. We obtain a new tractable convergence bound for FL algorithms with arbitrary client sampling probability. Based on the bound, we analytically establish the relationship between the total learning time and sampling probability with an adaptive bandwidth allocation scheme, which results in a non-convex optimization problem. We design an efficient algorithm for learning the unknown parameters in the convergence bound and develop a low-complexity algorithm to approximately solve the non-convex problem. Our solution reveals the impact of system and statistical heterogeneity parameters on the optimal client sampling design. Moreover, our solution shows that as the number of sampled clients increases, the total convergence time first decreases and then increases because a larger sampling number reduces the number of rounds for convergence but results in a longer expected time per-round due to limited wireless bandwidth. Experimental results from both hardware prototype and simulation demonstrate that our proposed sampling scheme significantly reduces the convergence time compared to several baseline sampling schemes.
Paper Structure (31 sections, 5 theorems, 40 equations, 4 figures, 3 tables, 2 algorithms)

This paper contains 31 sections, 5 theorems, 40 equations, 4 figures, 3 tables, 2 algorithms.

Key Result

Lemma 1

(Adaptive Client Sampling and Model Aggregation) When clients $\mathcal{K}^{(r)}(\boldsymbol{q})$ are sampled with probability $\boldsymbol{q}=\{q_1, \ldots q_N\}$ and their local updates are aggregated as then we have

Figures (4)

  • Figure 1: A heterogeneous federated learning training round over wireless networks, where $K$ out of $N$ clients are sampled according to the probability distribution $\boldsymbol{q}=\{q_1, \ldots, q_i, \ldots, q_N\}$, with each sampled client $i$ being allocated bandwidth $f_i$.
  • Figure 2: Hardware prototype with the laptop serving as the central server and 40 Raspberry Pis serving as clients.
  • Figure 3: Performances of Setup 1 with logistic regression and EMNIST dataset for reaching target loss $1.16$ and target accuracy $70$%.
  • Figure 4: Performances of Setup 2 with logistic regression and $Synthetic$ dataset for reaching target loss $0.7$ and target accuracy $78$%.

Theorems & Definitions (10)

  • Lemma 1
  • proof
  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Lemma 2
  • proof
  • Theorem 3
  • proof