Table of Contents
Fetching ...

Online Client Scheduling and Resource Allocation for Efficient Federated Edge Learning

Zhidong Gao, Zhenxiao Zhang, Yu Zhang, Tongnian Wang, Yanmin Gong, Yuanxiong Guo

TL;DR

This paper investigates the optimal client scheduling and resource allocation for FL over mobile edge networks under resource constraints and uncertainty to minimize the training latency while maintaining the model accuracy and develops an online control scheme based on Lyapunov-based optimization for client sampling and resource allocation.

Abstract

Federated learning (FL) enables edge devices to collaboratively train a machine learning model without sharing their raw data. Due to its privacy-protecting benefits, FL has been deployed in many real-world applications. However, deploying FL over mobile edge networks with constrained resources such as power, bandwidth, and computation suffers from high training latency and low model accuracy, particularly under data and system heterogeneity. In this paper, we investigate the optimal client scheduling and resource allocation for FL over mobile edge networks under resource constraints and uncertainty to minimize the training latency while maintaining the model accuracy. Specifically, we first analyze the impact of client sampling on model convergence in FL and formulate a stochastic optimization problem that captures the trade-off between the running time and model performance under heterogeneous and uncertain system resources. To solve the formulated problem, we further develop an online control scheme based on Lyapunov-based optimization for client sampling and resource allocation without requiring the knowledge of future dynamics in the FL system. Extensive experimental results demonstrate that the proposed scheme can improve both the training latency and resource efficiency compared with the existing schemes.

Online Client Scheduling and Resource Allocation for Efficient Federated Edge Learning

TL;DR

This paper investigates the optimal client scheduling and resource allocation for FL over mobile edge networks under resource constraints and uncertainty to minimize the training latency while maintaining the model accuracy and develops an online control scheme based on Lyapunov-based optimization for client sampling and resource allocation.

Abstract

Federated learning (FL) enables edge devices to collaboratively train a machine learning model without sharing their raw data. Due to its privacy-protecting benefits, FL has been deployed in many real-world applications. However, deploying FL over mobile edge networks with constrained resources such as power, bandwidth, and computation suffers from high training latency and low model accuracy, particularly under data and system heterogeneity. In this paper, we investigate the optimal client scheduling and resource allocation for FL over mobile edge networks under resource constraints and uncertainty to minimize the training latency while maintaining the model accuracy. Specifically, we first analyze the impact of client sampling on model convergence in FL and formulate a stochastic optimization problem that captures the trade-off between the running time and model performance under heterogeneous and uncertain system resources. To solve the formulated problem, we further develop an online control scheme based on Lyapunov-based optimization for client sampling and resource allocation without requiring the knowledge of future dynamics in the FL system. Extensive experimental results demonstrate that the proposed scheme can improve both the training latency and resource efficiency compared with the existing schemes.

Paper Structure

This paper contains 30 sections, 8 theorems, 73 equations, 6 figures, 1 table, 2 algorithms.

Key Result

Theorem 1

Under Assumptions ass:smoothness, ass:bounded_sgd and ass:bound_dissimi, if the local learning rate $\eta \leq \min\{{1}/({32E^2 \beta^2\gamma^2}), {1}/({2\sqrt{2}E \beta})\}$, then Algorithm algorithm-1 satisfies

Figures (6)

  • Figure 1: Convergence rate and runtime comparisons of LROA and baselines on CIFAR-10. (a) Testing accuracy with runtime; (b) Testing accuracy with communication rounds.
  • Figure 2: Convergence rate and runtime comparisons of LROA and baselines on FEMNIST. (a) Testing accuracy with runtime; (b) Testing accuracy with communication rounds.
  • Figure 3: Testing accuracy vs. total time of LROA under different $\lambda$ on CIFAR-10 (a) and FEMNIST (b).
  • Figure 4: Convergence of energy and objective value for CIFAR-10 (a,b) and FEMNIST (c,d). (a,c) Expected time-averaged energy consumption. (b,d) Expected time averaged objective value. $\mu=1.0$ and $\nu\in \{10^3,10^4,10^5,10^6\}$.
  • Figure 5: Testing accuracy vs. total time of LROA under different sampling numbers $K$ on CIFAR-10 (a) and FEMNIST (b). The marker shows the ending point of each curve. (Circle: LROA, Square: Uni-D)
  • ...and 1 more figures

Theorems & Definitions (8)

  • Theorem 1: Convergence Result with Adaptive Sampling Probabilities
  • Lemma 1
  • Theorem 2: Solution to P2.1.1
  • Theorem 3: Solution to P2.1.2
  • Theorem 4
  • Lemma 2
  • Lemma 3: Unbiased sampling and Bounded expectation
  • Lemma 4: Bounded Local Divergence