Table of Contents
Fetching ...

CSAFL: A Clustered Semi-Asynchronous Federated Learning Framework

Yu Zhang, Moming Duan, Duo Liu, Li Li, Ao Ren, Xianzhang Chen, Yujuan Tan, Chengliang Wang

TL;DR

The paper addresses the straggler and model-staleness challenges in federated learning by proposing CSAFL, a clustered semi-asynchronous framework that groups clients via spectral clustering based on delay and gradient direction and allows mixed synchronous/asynchronous updates within fixed time budgets. By maintaining group-specific models and binding updates to group dynamics, CSAFL mitigates straggler effects while controlling staleness, achieving higher accuracy and competitive convergence across four non-IID datasets. Empirical results show significant gains over TA-FedAvg, including substantial improvements on non-IID FEMNIST, demonstrating the practical viability of latency-aware, group-based update strategies for robust FL in heterogeneous settings.

Abstract

Federated learning (FL) is an emerging distributed machine learning paradigm that protects privacy and tackles the problem of isolated data islands. At present, there are two main communication strategies of FL: synchronous FL and asynchronous FL. The advantages of synchronous FL are that the model has high precision and fast convergence speed. However, this synchronous communication strategy has the risk that the central server waits too long for the devices, namely, the straggler effect which has a negative impact on some time-critical applications. Asynchronous FL has a natural advantage in mitigating the straggler effect, but there are threats of model quality degradation and server crash. Therefore, we combine the advantages of these two strategies to propose a clustered semi-asynchronous federated learning (CSAFL) framework. We evaluate CSAFL based on four imbalanced federated datasets in a non-IID setting and compare CSAFL to the baseline methods. The experimental results show that CSAFL significantly improves test accuracy by more than +5% on the four datasets compared to TA-FedAvg. In particular, CSAFL improves absolute test accuracy by +34.4% on non-IID FEMNIST compared to TA-FedAvg.

CSAFL: A Clustered Semi-Asynchronous Federated Learning Framework

TL;DR

The paper addresses the straggler and model-staleness challenges in federated learning by proposing CSAFL, a clustered semi-asynchronous framework that groups clients via spectral clustering based on delay and gradient direction and allows mixed synchronous/asynchronous updates within fixed time budgets. By maintaining group-specific models and binding updates to group dynamics, CSAFL mitigates straggler effects while controlling staleness, achieving higher accuracy and competitive convergence across four non-IID datasets. Empirical results show significant gains over TA-FedAvg, including substantial improvements on non-IID FEMNIST, demonstrating the practical viability of latency-aware, group-based update strategies for robust FL in heterogeneous settings.

Abstract

Federated learning (FL) is an emerging distributed machine learning paradigm that protects privacy and tackles the problem of isolated data islands. At present, there are two main communication strategies of FL: synchronous FL and asynchronous FL. The advantages of synchronous FL are that the model has high precision and fast convergence speed. However, this synchronous communication strategy has the risk that the central server waits too long for the devices, namely, the straggler effect which has a negative impact on some time-critical applications. Asynchronous FL has a natural advantage in mitigating the straggler effect, but there are threats of model quality degradation and server crash. Therefore, we combine the advantages of these two strategies to propose a clustered semi-asynchronous federated learning (CSAFL) framework. We evaluate CSAFL based on four imbalanced federated datasets in a non-IID setting and compare CSAFL to the baseline methods. The experimental results show that CSAFL significantly improves test accuracy by more than +5% on the four datasets compared to TA-FedAvg. In particular, CSAFL improves absolute test accuracy by +34.4% on non-IID FEMNIST compared to TA-FedAvg.

Paper Structure

This paper contains 18 sections, 12 equations, 6 figures, 2 tables, 4 algorithms.

Figures (6)

  • Figure 1: The training procedures of synchronous FL and asynchronous FL
  • Figure 2: The straggler effect in the synchronous FL framework can be observed in the figure on the left, and the frequency distribution of the clients‘ idle time is unbalanced; The figure on the right shows the accuracy comparison of asynchronous FL and synchronous FL in the process of model training.
  • Figure 3: The framework of CSAFL. In a group training process, group 1 randomly selects three clients, A, B, C. The up arrow indicates that the client updates the local model to the group model, and the down arrow indicates that the client downloads the parameters of the group model
  • Figure 4: The accuracy curves of CSAFL, R-FedAvg, baselines and NoG-FedAvg on MNIST and FEMNIST. 10K=10000, 15K=15000, unit:ms.
  • Figure 5: The frequency distribution of the clients' idle time. 10K=10000, 15K=15000, unit: ms.
  • ...and 1 more figures