Table of Contents
Fetching ...

An Experimental Study of Different Aggregation Schemes in Semi-Asynchronous Federated Learning

Yunbo Li, Jiaping Gui, Yue Wu

TL;DR

This paper investigates the performance gap between gradient-based (FedSGD) and model-based (FedAvg) aggregation in semi-asynchronous federated learning under heterogeneous client conditions. Through extensive experiments across CV and NLP tasks, five datasets, multiple models, and various non-IID data distributions, the authors quantify accuracy, convergence, oscillation, and resource usage differences. They find that FedSGD generally yields higher accuracy and faster convergence but suffers from greater instability and NAN loss events, while FedAvg offers smoother convergence and better straggler robustness at the cost of lower accuracy and higher transmission/training overhead. The findings provide practical guidance for choosing an aggregation strategy in SAFL, highlighting the tradeoffs between accuracy, convergence, stability, and resource efficiency in heterogeneous environments.

Abstract

Federated learning is highly valued due to its high-performance computing in distributed environments while safeguarding data privacy. To address resource heterogeneity, researchers have proposed a semi-asynchronous federated learning (SAFL) architecture. However, the performance gap between different aggregation targets in SAFL remain unexplored. In this paper, we systematically compare the performance between two algorithm modes, FedSGD and FedAvg that correspond to aggregating gradients and models, respectively. Our results across various task scenarios indicate these two modes exhibit a substantial performance gap. Specifically, FedSGD achieves higher accuracy and faster convergence but experiences more severe fluctuates in accuracy, whereas FedAvg excels in handling straggler issues but converges slower with reduced accuracy.

An Experimental Study of Different Aggregation Schemes in Semi-Asynchronous Federated Learning

TL;DR

This paper investigates the performance gap between gradient-based (FedSGD) and model-based (FedAvg) aggregation in semi-asynchronous federated learning under heterogeneous client conditions. Through extensive experiments across CV and NLP tasks, five datasets, multiple models, and various non-IID data distributions, the authors quantify accuracy, convergence, oscillation, and resource usage differences. They find that FedSGD generally yields higher accuracy and faster convergence but suffers from greater instability and NAN loss events, while FedAvg offers smoother convergence and better straggler robustness at the cost of lower accuracy and higher transmission/training overhead. The findings provide practical guidance for choosing an aggregation strategy in SAFL, highlighting the tradeoffs between accuracy, convergence, stability, and resource efficiency in heterogeneous environments.

Abstract

Federated learning is highly valued due to its high-performance computing in distributed environments while safeguarding data privacy. To address resource heterogeneity, researchers have proposed a semi-asynchronous federated learning (SAFL) architecture. However, the performance gap between different aggregation targets in SAFL remain unexplored. In this paper, we systematically compare the performance between two algorithm modes, FedSGD and FedAvg that correspond to aggregating gradients and models, respectively. Our results across various task scenarios indicate these two modes exhibit a substantial performance gap. Specifically, FedSGD achieves higher accuracy and faster convergence but experiences more severe fluctuates in accuracy, whereas FedAvg excels in handling straggler issues but converges slower with reduced accuracy.
Paper Structure (53 sections, 7 equations, 24 figures, 3 tables)

This paper contains 53 sections, 7 equations, 24 figures, 3 tables.

Figures (24)

  • Figure 1: Computational process in synchronous federated learning and semi-asynchronous federated learning with $K=3$.
  • Figure 2: Global accuracy and loss of different models under CIFAR-10 dataset using ResNet-18 in SAFL. Note that -1 denotes the NAN value for loss.
  • Figure 3: Statistics of severe oscillations of the ResNet-18 model under the CIFAR-10 dataset with Hetero Dirichlet distribution.
  • Figure 4: Directional Gradient Aggregation.
  • Figure 5: Global accuracy and loss of different models under CIFAR-100 dataset using ResNet-18 in SAFL. Note that -1 denotes the NAN value for loss.
  • ...and 19 more figures