FedAST: Federated Asynchronous Simultaneous Training
Baris Askin, Pranay Sharma, Carlee Joe-Wong, Gauri Joshi
TL;DR
FedAST addresses the challenge of training multiple FL models concurrently on a shared pool of clients by introducing buffered asynchronous aggregation and dynamic cross-task resource allocation. It combines a server-side Realloc mechanism with per-task buffers to balance heterogeneity and staleness, leveraging a virtual sequence analysis to derive convergence guarantees for smooth non-convex objectives. Theoretical results decompose the convergence bound into FedAvg-like terms and an asynchronous aggregation term, and show that with appropriate learning rates and buffering, FedAST matches favorable convergence rates while reducing wall-clock training time. Empirically, FedAST outperforms synchronous and asynchronous baselines across MNIST, Fashion-MNIST, CIFAR-10/100, and Shakespeare, achieving up to 46% reductions in time to completion for multiple tasks and demonstrating the practical value of dynamic allocation and buffering in multi-task federated settings.
Abstract
Federated Learning (FL) enables edge devices or clients to collaboratively train machine learning (ML) models without sharing their private data. Much of the existing work in FL focuses on efficiently learning a model for a single task. In this paper, we study simultaneous training of multiple FL models using a common set of clients. The few existing simultaneous training methods employ synchronous aggregation of client updates, which can cause significant delays because large models and/or slow clients can bottleneck the aggregation. On the other hand, a naive asynchronous aggregation is adversely affected by stale client updates. We propose FedAST, a buffered asynchronous federated simultaneous training algorithm that overcomes bottlenecks from slow models and adaptively allocates client resources across heterogeneous tasks. We provide theoretical convergence guarantees for FedAST for smooth non-convex objective functions. Extensive experiments over multiple real-world datasets demonstrate that our proposed method outperforms existing simultaneous FL approaches, achieving up to 46.0% reduction in time to train multiple tasks to completion.
