Table of Contents
Fetching ...

FedAST: Federated Asynchronous Simultaneous Training

Baris Askin, Pranay Sharma, Carlee Joe-Wong, Gauri Joshi

TL;DR

FedAST addresses the challenge of training multiple FL models concurrently on a shared pool of clients by introducing buffered asynchronous aggregation and dynamic cross-task resource allocation. It combines a server-side Realloc mechanism with per-task buffers to balance heterogeneity and staleness, leveraging a virtual sequence analysis to derive convergence guarantees for smooth non-convex objectives. Theoretical results decompose the convergence bound into FedAvg-like terms and an asynchronous aggregation term, and show that with appropriate learning rates and buffering, FedAST matches favorable convergence rates while reducing wall-clock training time. Empirically, FedAST outperforms synchronous and asynchronous baselines across MNIST, Fashion-MNIST, CIFAR-10/100, and Shakespeare, achieving up to 46% reductions in time to completion for multiple tasks and demonstrating the practical value of dynamic allocation and buffering in multi-task federated settings.

Abstract

Federated Learning (FL) enables edge devices or clients to collaboratively train machine learning (ML) models without sharing their private data. Much of the existing work in FL focuses on efficiently learning a model for a single task. In this paper, we study simultaneous training of multiple FL models using a common set of clients. The few existing simultaneous training methods employ synchronous aggregation of client updates, which can cause significant delays because large models and/or slow clients can bottleneck the aggregation. On the other hand, a naive asynchronous aggregation is adversely affected by stale client updates. We propose FedAST, a buffered asynchronous federated simultaneous training algorithm that overcomes bottlenecks from slow models and adaptively allocates client resources across heterogeneous tasks. We provide theoretical convergence guarantees for FedAST for smooth non-convex objective functions. Extensive experiments over multiple real-world datasets demonstrate that our proposed method outperforms existing simultaneous FL approaches, achieving up to 46.0% reduction in time to train multiple tasks to completion.

FedAST: Federated Asynchronous Simultaneous Training

TL;DR

FedAST addresses the challenge of training multiple FL models concurrently on a shared pool of clients by introducing buffered asynchronous aggregation and dynamic cross-task resource allocation. It combines a server-side Realloc mechanism with per-task buffers to balance heterogeneity and staleness, leveraging a virtual sequence analysis to derive convergence guarantees for smooth non-convex objectives. Theoretical results decompose the convergence bound into FedAvg-like terms and an asynchronous aggregation term, and show that with appropriate learning rates and buffering, FedAST matches favorable convergence rates while reducing wall-clock training time. Empirically, FedAST outperforms synchronous and asynchronous baselines across MNIST, Fashion-MNIST, CIFAR-10/100, and Shakespeare, achieving up to 46% reductions in time to completion for multiple tasks and demonstrating the practical value of dynamic allocation and buffering in multi-task federated settings.

Abstract

Federated Learning (FL) enables edge devices or clients to collaboratively train machine learning (ML) models without sharing their private data. Much of the existing work in FL focuses on efficiently learning a model for a single task. In this paper, we study simultaneous training of multiple FL models using a common set of clients. The few existing simultaneous training methods employ synchronous aggregation of client updates, which can cause significant delays because large models and/or slow clients can bottleneck the aggregation. On the other hand, a naive asynchronous aggregation is adversely affected by stale client updates. We propose FedAST, a buffered asynchronous federated simultaneous training algorithm that overcomes bottlenecks from slow models and adaptively allocates client resources across heterogeneous tasks. We provide theoretical convergence guarantees for FedAST for smooth non-convex objective functions. Extensive experiments over multiple real-world datasets demonstrate that our proposed method outperforms existing simultaneous FL approaches, achieving up to 46.0% reduction in time to train multiple tasks to completion.
Paper Structure (65 sections, 10 theorems, 56 equations, 26 figures, 4 tables, 4 algorithms)

This paper contains 65 sections, 10 theorems, 56 equations, 26 figures, 4 tables, 4 algorithms.

Key Result

Theorem 1

Suppose that Assumptions assump:smoothness - assump:maxstale hold, and there are $R_m$ active local training requests corresponding to task $m \in [M]$, and the server and client learning rates, $\{ \eta^{s}_{m}, \eta^{c}_{m} \}$ respectively, satisfy $\eta^{s}_{m} \leq \sqrt{\tau_mb_m}$ and $\eta^{ where $\delta_m = f_{m}({\mathbf x}_{m}^{(0)}) - \min_{\mathbf x} f_{m} ({\mathbf x})$.

Figures (26)

  • Figure 1: In our proposed algorithm $\texttt{FedAST}$, the server assigns local training requests (shown in striped and orange blocks for two simultaneous tasks), which are queued at the clients and processed in a first-come-first-served manner. Completed requests are aggregated asynchronously at the server. In the figure, snapshots of the process at two different times are seen. Adjusting the number of requests, $\texttt{FedAST}$ periodically reallocates the resources shared across models.
  • Figure 2: Mean test accuracy for compared algorithms on six identical CIFAR-10 tasks trained simultaneously. $\texttt{FedAST}$ trains faster than synchronous methods. The synchronous method without straggler mitigation is by far the slowest.
  • Figure 3: The mean final test accuracy values of $\texttt{FedAST}$ (blue), $\texttt{FedAST-NoBuffer}$ (olive green) and centralized training (violet) with varying active client ratio, when training $3$ identical models. The left (right) figure is for CIFAR-10 (Fashion-MNIST) dataset. With more active clients, the importance of buffer increases due to increasing staleness.
  • Figure 4: The mean test accuracy values of $\texttt{FedAST}$ and $\texttt{FedAST-NoBuffer}$, when simultaneously training one model for CIFAR-10 and one for Fashion-MNIST. $\texttt{FedAST}$ achieves higher and more stable accuracy levels.
  • Figure 5: Mean training times of $\texttt{FedAST}$ and $\texttt{Sync-ST}$ to attain target accuracy levels in (\ref{['table:exp_summary']}) on $2$/$4$/$6$ tasks with CIFAR-10, Fashion-MNIST, MNIST, and Shakespeare datasets. $\texttt{FedAST}$ requires consistently lower wall-clock time for training compared to $\texttt{Sync-ST}$; the percentages represent these time gains.
  • ...and 21 more figures

Theorems & Definitions (17)

  • Theorem 1: Convergence of $\texttt{FedAST}$
  • Corollary 1.1: Asymptotic convergence after setting learning rates
  • Remark 1
  • Remark 2
  • Lemma 1
  • Lemma 2
  • Remark 3
  • Lemma 3
  • Lemma 4
  • Remark 4
  • ...and 7 more