Table of Contents
Fetching ...

FedQS: Optimizing Gradient and Model Aggregation for Semi-Asynchronous Federated Learning

Yunbo Li, Jiaping Gui, Zhihang Deng, Fanchao Meng, Yue Wu

TL;DR

FedQS addresses the dual challenge of gradient- and model-aggregation in semi-asynchronous federated learning by classifying clients into four types and adaptively guiding their local training while a server-side module reweights and aggregates updates. The approach combines a pseudo-global-gradient estimation, per-type training adaptations, and a dynamic weighting scheme to reconcile stability and convergence speed, with formal convergence guarantees for both aggregation modes. Empirically, FedQS achieves the highest accuracy and fastest convergence across CV, NLP, and real-world tasks, and demonstrates robustness to varying system settings and hyperparameters. The work provides a principled, scalable framework bridging the gap between gradient and model aggregation in SAFL, with practical potential for real-world federated deployments.

Abstract

Federated learning (FL) enables collaborative model training across multiple parties without sharing raw data, with semi-asynchronous FL (SAFL) emerging as a balanced approach between synchronous and asynchronous FL. However, SAFL faces significant challenges in optimizing both gradient-based (e.g., FedSGD) and model-based (e.g., FedAvg) aggregation strategies, which exhibit distinct trade-offs in accuracy, convergence speed, and stability. While gradient aggregation achieves faster convergence and higher accuracy, it suffers from pronounced fluctuations, whereas model aggregation offers greater stability but slower convergence and suboptimal accuracy. This paper presents FedQS, the first framework to theoretically analyze and address these disparities in SAFL. FedQS introduces a divide-and-conquer strategy to handle client heterogeneity by classifying clients into four distinct types and adaptively optimizing their local training based on data distribution characteristics and available computational resources. Extensive experiments on computer vision, natural language processing, and real-world tasks demonstrate that FedQS achieves the highest accuracy, attains the lowest loss, and ranks among the fastest in convergence speed, outperforming state-of-the-art baselines. Our work bridges the gap between aggregation strategies in SAFL, offering a unified solution for stable, accurate, and efficient federated learning. The code and datasets are available at https://github.com/bkjod/FedQS_.

FedQS: Optimizing Gradient and Model Aggregation for Semi-Asynchronous Federated Learning

TL;DR

FedQS addresses the dual challenge of gradient- and model-aggregation in semi-asynchronous federated learning by classifying clients into four types and adaptively guiding their local training while a server-side module reweights and aggregates updates. The approach combines a pseudo-global-gradient estimation, per-type training adaptations, and a dynamic weighting scheme to reconcile stability and convergence speed, with formal convergence guarantees for both aggregation modes. Empirically, FedQS achieves the highest accuracy and fastest convergence across CV, NLP, and real-world tasks, and demonstrates robustness to varying system settings and hyperparameters. The work provides a principled, scalable framework bridging the gap between gradient and model aggregation in SAFL, with practical potential for real-world federated deployments.

Abstract

Federated learning (FL) enables collaborative model training across multiple parties without sharing raw data, with semi-asynchronous FL (SAFL) emerging as a balanced approach between synchronous and asynchronous FL. However, SAFL faces significant challenges in optimizing both gradient-based (e.g., FedSGD) and model-based (e.g., FedAvg) aggregation strategies, which exhibit distinct trade-offs in accuracy, convergence speed, and stability. While gradient aggregation achieves faster convergence and higher accuracy, it suffers from pronounced fluctuations, whereas model aggregation offers greater stability but slower convergence and suboptimal accuracy. This paper presents FedQS, the first framework to theoretically analyze and address these disparities in SAFL. FedQS introduces a divide-and-conquer strategy to handle client heterogeneity by classifying clients into four distinct types and adaptively optimizing their local training based on data distribution characteristics and available computational resources. Extensive experiments on computer vision, natural language processing, and real-world tasks demonstrate that FedQS achieves the highest accuracy, attains the lowest loss, and ranks among the fastest in convergence speed, outperforming state-of-the-art baselines. Our work bridges the gap between aggregation strategies in SAFL, offering a unified solution for stable, accurate, and efficient federated learning. The code and datasets are available at https://github.com/bkjod/FedQS_.

Paper Structure

This paper contains 36 sections, 9 theorems, 85 equations, 13 figures, 14 tables.

Key Result

Theorem 4.2

Under Assumptions sec:ass, let $\beta = \max_{i,t}\{\eta_i^t, \eta_g\}$ with $\sqrt{\frac{1}{RK - 1}} <\beta < \sqrt{\frac{3}{2RK-3}}$, where $R = \frac{E\theta - E\theta^2 - \theta^2 + \theta^{E+2}}{(1 - \theta)^2}$. In $R$'s formula, $E$ is the maximum local epoch, and $\theta = \max_{i,t} \{m_i^t where $\mathcal{V} = (3 - \frac{2\beta^2KR}{\beta^2 + 1}) \in (0, 1)$ controls the convergence rate

Figures (13)

  • Figure 1: FedSGD vs. FedAvg in SAFL.
  • Figure 1: The average best accuracy and corresponding differences between two aggregation strategies under varying influencing factors.
  • Figure 2: Workflow of FedQS, featuring clients with diverse resource capabilities. In FedQS, during a global training round, Mod① first utilizes the global model distributed by Mod③ to compute pseudo-global gradients and sends them to Mod②. Then, Mod② employs the information disseminated by Mod③ as input when determining a local training strategy and leverages the global model from Mod③ as the starting point for local training. Finally, Mod③ uses the local update data from Mod② for global aggregation and leverages the similarity information from Mod② to update the global state table.
  • Figure 3: Categorization of clients in Mod②.
  • Figure 4: Loss of FedQS and the baselines under a representative CV task ($x = 0.5$), an NLP task ($R=600$), and an RWD task (gender). The left three subfigures are based on model aggregation, while the right three are based on gradient aggregation.
  • ...and 8 more figures

Theorems & Definitions (25)

  • Remark 4.1
  • Theorem 4.2: Gradient Aggregation Convergence
  • Theorem 4.3: Model Aggregation Convergence
  • Remark 4.4
  • Remark 4.5
  • Remark 4.6
  • Remark 4.7
  • Theorem A.4: The convergence of FedQS-SGD
  • proof
  • Theorem A.5: The convergence of FedQS-Avg
  • ...and 15 more