Table of Contents
Fetching ...

The Impact Analysis of Delays in Asynchronous Federated Learning with Data Heterogeneity for Edge Intelligence

Ziruo Hao, Zhenhua Cui, Tao Yang, Bo Hu, Xiaofeng Wu, Hui Feng

TL;DR

This work analyzes asynchronous federated learning (AFL) in edge networks under data heterogeneity and unknown delay causes. It introduces an asynchronous error measure and compares two aggregation schemes: AUDG, which uses only received delayed gradients, and PSURDG, which reuses delayed gradients to decouple heterogeneity from delays. The authors provide convergence bounds for synchronous updates (SFL), AUDG, and PSURDG under non-IID data, highlighting how delays interact with data heterogeneity and influence convergence. Theoretical results are complemented by simulations on MNIST with CNNs, showing that discarding outdated information is not always optimal and that gradient reuse can improve performance in high-heterogeneity, low-delay scenarios. Overall, the paper offers new insights into when and how delayed information should be handled in AFL for edge intelligence, with practical implications for designing delay-aware aggregation strategies.

Abstract

Federated learning (FL) has provided a new methodology for coordinating a group of clients to train a machine learning model collaboratively, bringing an efficient paradigm in edge intelligence. Despite its promise, FL faces several critical challenges in practical applications involving edge devices, such as data heterogeneity and delays stemming from communication and computation constraints. This paper examines the impact of unknown causes of delay on training performance in an Asynchronous Federated Learning (AFL) system with data heterogeneity. Initially, an asynchronous error definition is proposed, based on which the solely adverse impact of data heterogeneity is theoretically analyzed within the traditional Synchronous Federated Learning (SFL) framework. Furthermore, Asynchronous Updates with Delayed Gradients (AUDG), a conventional AFL scheme, is discussed. Investigation into AUDG reveals that the negative influence of data heterogeneity is correlated with delays, while a shorter average delay from a specific client does not consistently enhance training performance. In order to compensate for the scenarios where AUDG are not adapted, Pseudo-synchronous Updates by Reusing Delayed Gradients (PSURDG) is proposed, and its theoretical convergence is analyzed. In both AUDG and PSURDG, only a random set of clients successfully transmits their updated results to the central server in each iteration. The critical difference between them lies in whether the delayed information is reused. Finally, both schemes are validated and compared through theoretical analysis and simulations, demonstrating more intuitively that discarding outdated information due to time delays is not always the best approach.

The Impact Analysis of Delays in Asynchronous Federated Learning with Data Heterogeneity for Edge Intelligence

TL;DR

This work analyzes asynchronous federated learning (AFL) in edge networks under data heterogeneity and unknown delay causes. It introduces an asynchronous error measure and compares two aggregation schemes: AUDG, which uses only received delayed gradients, and PSURDG, which reuses delayed gradients to decouple heterogeneity from delays. The authors provide convergence bounds for synchronous updates (SFL), AUDG, and PSURDG under non-IID data, highlighting how delays interact with data heterogeneity and influence convergence. Theoretical results are complemented by simulations on MNIST with CNNs, showing that discarding outdated information is not always optimal and that gradient reuse can improve performance in high-heterogeneity, low-delay scenarios. Overall, the paper offers new insights into when and how delayed information should be handled in AFL for edge intelligence, with practical implications for designing delay-aware aggregation strategies.

Abstract

Federated learning (FL) has provided a new methodology for coordinating a group of clients to train a machine learning model collaboratively, bringing an efficient paradigm in edge intelligence. Despite its promise, FL faces several critical challenges in practical applications involving edge devices, such as data heterogeneity and delays stemming from communication and computation constraints. This paper examines the impact of unknown causes of delay on training performance in an Asynchronous Federated Learning (AFL) system with data heterogeneity. Initially, an asynchronous error definition is proposed, based on which the solely adverse impact of data heterogeneity is theoretically analyzed within the traditional Synchronous Federated Learning (SFL) framework. Furthermore, Asynchronous Updates with Delayed Gradients (AUDG), a conventional AFL scheme, is discussed. Investigation into AUDG reveals that the negative influence of data heterogeneity is correlated with delays, while a shorter average delay from a specific client does not consistently enhance training performance. In order to compensate for the scenarios where AUDG are not adapted, Pseudo-synchronous Updates by Reusing Delayed Gradients (PSURDG) is proposed, and its theoretical convergence is analyzed. In both AUDG and PSURDG, only a random set of clients successfully transmits their updated results to the central server in each iteration. The critical difference between them lies in whether the delayed information is reused. Finally, both schemes are validated and compared through theoretical analysis and simulations, demonstrating more intuitively that discarding outdated information due to time delays is not always the best approach.

Paper Structure

This paper contains 13 sections, 4 theorems, 57 equations, 8 figures, 10 tables, 3 algorithms.

Key Result

Lemma 1

(Asynchronous Error Separation Rule) Under the AFL scheme, the loss function $f$ has the following inequality after $T$ iterations where $w^{1}$ is the initialization parameters.

Figures (8)

  • Figure 1: Delays in an asynchronous federated system can be attributed to three primary factors: (a)Slow computing speed (b)The failure to download global parameters (c)The failure to upload local parameters.
  • Figure 2: The theoretical analysis structure
  • Figure 3: The Evolution of the Normalized Loss Values in SFL
  • Figure 4: Accuracy: Over-parameterized CNN & IID
  • Figure 5: Accuracy: Normal CNN & IID
  • ...and 3 more figures

Theorems & Definitions (6)

  • Definition 1
  • Lemma 1
  • Theorem 1
  • Theorem 2
  • Definition 2
  • Theorem 3