Asynchronous Multi-Model Dynamic Federated Learning over Wireless Networks: Theory, Modeling, and Optimization
Zhan-Lun Chang, Seyyedali Hosseinalipour, Mung Chiang, Christopher G. Brinton
TL;DR
This work addresses the challenge of performing asynchronous, multi-task federated learning over wireless networks with dynamic data statistics. It introduces DMA-FL, which uses scheduling tensors and rect functions to model device participation and data drift across tasks, and develops a convergence analysis linking these factors to learning performance. A joint resource allocation and device scheduling optimization is formulated and solved via relaxation and successive convex approximation to balance model quality and energy consumption, with convergence guarantees. Numerical experiments on MNIST, Fashion-MNIST, and SVHN demonstrate that DMA-FL achieves superior performance-energy tradeoffs compared to baseline asynchronous and synchronous FL methods, particularly under significant data drift and task heterogeneity. The approach offers a principled, scalable framework for deploying multi-task FL in practical edge networks, enabling responsive, energy-aware learning at scale.
Abstract
Federated learning (FL) has emerged as a key technique for distributed machine learning (ML). Most literature on FL has focused on ML model training for (i) a single task/model, with (ii) a synchronous scheme for updating model parameters, and (iii) a static data distribution setting across devices, which is often not realistic in practical wireless environments. To address this, we develop DMA-FL considering dynamic FL with multiple downstream tasks/models over an asynchronous model update architecture. We first characterize convergence via introducing scheduling tensors and rectangular functions to capture the impact of system parameters on learning performance. Our analysis sheds light on the joint impact of device training variables (e.g., number of local gradient descent steps), asynchronous scheduling decisions (i.e., when a device trains a task), and dynamic data drifts on the performance of ML training for different tasks. Leveraging these results, we formulate an optimization for jointly configuring resource allocation and device scheduling to strike an efficient trade-off between energy consumption and ML performance. Our solver for the resulting non-convex mixed integer program employs constraint relaxations and successive convex approximations with convergence guarantees. Through numerical experiments, we reveal that DMA-FL substantially improves the performance-efficiency tradeoff.
