Table of Contents
Fetching ...

FedFa: A Fully Asynchronous Training Paradigm for Federated Learning

Haotian Xu, Zhaorui Zhang, Sheng Di, Benben Liu, Khalid Ayed Alharthi, Jiannong Cao

TL;DR

The paper tackles the wall-clock inefficiency of standard Federated Learning caused by barrier synchronization across heterogeneous devices. It introduces FedFa, a fully asynchronous training paradigm that maintains a server-side sliding window to merge multiple historical updates, thereby eliminating waiting times while preserving convergence. A convergence analysis under standard assumptions shows FedFa attains a convergence rate comparable to FedBuff, and two practical variants, FedFa-Param and FedFa-Delta, are proposed. Empirical results on CIFAR-10 and NLP tasks demonstrate substantial improvements in wall-clock time (up to sixfold) and reductions in communication rounds (up to 1.9x) with maintained accuracy in both IID and Non-IID settings, and the method remains compatible with secure aggregation.

Abstract

Federated learning has been identified as an efficient decentralized training paradigm for scaling the machine learning model training on a large number of devices while guaranteeing the data privacy of the trainers. FedAvg has become a foundational parameter update strategy for federated learning, which has been promising to eliminate the effect of the heterogeneous data across clients and guarantee convergence. However, the synchronization parameter update barriers for each communication round during the training significant time on waiting, slowing down the training procedure. Therefore, recent state-of-the-art solutions propose using semi-asynchronous approaches to mitigate the waiting time cost with guaranteed convergence. Nevertheless, emerging semi-asynchronous approaches are unable to eliminate the waiting time completely. We propose a full asynchronous training paradigm, called FedFa, which can guarantee model convergence and eliminate the waiting time completely for federated learning by using a few buffered results on the server for parameter updating. Further, we provide theoretical proof of the convergence rate for our proposed FedFa. Extensive experimental results indicate our approach effectively improves the training performance of federated learning by up to 6x and 4x speedup compared to the state-of-the-art synchronous and semi-asynchronous strategies while retaining high accuracy in both IID and Non-IID scenarios.

FedFa: A Fully Asynchronous Training Paradigm for Federated Learning

TL;DR

The paper tackles the wall-clock inefficiency of standard Federated Learning caused by barrier synchronization across heterogeneous devices. It introduces FedFa, a fully asynchronous training paradigm that maintains a server-side sliding window to merge multiple historical updates, thereby eliminating waiting times while preserving convergence. A convergence analysis under standard assumptions shows FedFa attains a convergence rate comparable to FedBuff, and two practical variants, FedFa-Param and FedFa-Delta, are proposed. Empirical results on CIFAR-10 and NLP tasks demonstrate substantial improvements in wall-clock time (up to sixfold) and reductions in communication rounds (up to 1.9x) with maintained accuracy in both IID and Non-IID settings, and the method remains compatible with secure aggregation.

Abstract

Federated learning has been identified as an efficient decentralized training paradigm for scaling the machine learning model training on a large number of devices while guaranteeing the data privacy of the trainers. FedAvg has become a foundational parameter update strategy for federated learning, which has been promising to eliminate the effect of the heterogeneous data across clients and guarantee convergence. However, the synchronization parameter update barriers for each communication round during the training significant time on waiting, slowing down the training procedure. Therefore, recent state-of-the-art solutions propose using semi-asynchronous approaches to mitigate the waiting time cost with guaranteed convergence. Nevertheless, emerging semi-asynchronous approaches are unable to eliminate the waiting time completely. We propose a full asynchronous training paradigm, called FedFa, which can guarantee model convergence and eliminate the waiting time completely for federated learning by using a few buffered results on the server for parameter updating. Further, we provide theoretical proof of the convergence rate for our proposed FedFa. Extensive experimental results indicate our approach effectively improves the training performance of federated learning by up to 6x and 4x speedup compared to the state-of-the-art synchronous and semi-asynchronous strategies while retaining high accuracy in both IID and Non-IID scenarios.
Paper Structure (16 sections, 23 equations, 6 figures, 6 tables, 2 algorithms)

This paper contains 16 sections, 23 equations, 6 figures, 6 tables, 2 algorithms.

Figures (6)

  • Figure 1: The wall clock time comparison.
  • Figure 2: Data distribution where a larger circle indicates a larger dataset size for each label 0-9 of CIFAR10.
  • Figure 3: Performance comparison of different training strategies on both IID and Non-IID data settings regarding the wall clock time.
  • Figure 4: Performance comparison of different training strategies on both IID and Non-IID data settings regarding communication round.
  • Figure 5: The comparison of different buffer sizes $K$.
  • ...and 1 more figures