WHALE-FL: Wireless and Heterogeneity Aware Latency Efficient Federated Learning over Mobile Devices via Adaptive Subnetwork Scheduling
Huai-an Su, Jiaxiang Geng, Liang Li, Xiaoqi Qin, Yanzhao Hou, Hao Wang, Xin Fu, Miao Pan
TL;DR
Federated learning on mobile devices suffers from compute and wireless heterogeneity, creating latency and stragglers. WHALE-FL introduces adaptive, width-based subnetwork scheduling guided by a joint utility that combines system efficiency and training efficiency via windowed Fisher information, enabling per-round, device-specific subnetwork sizing. The approach maps utility to discrete subnetworks and aggregates heterogeneous updates, with a WHALE-FL prototype showing substantial latency reductions (≈1.5x–2.1x) across MNIST, CIFAR-10, HAR, and WikiText-2 without sacrificing accuracy, and with analysis of Fisher information dynamics and hyperparameter sensitivity. This work enables scalable, fast FL on real-world, heterogeneous mobile devices by accommodating dynamic system and training demands in the scheduling policy.
Abstract
As a popular distributed learning paradigm, federated learning (FL) over mobile devices fosters numerous applications, while their practical deployment is hindered by participating devices' computing and communication heterogeneity. Some pioneering research efforts proposed to extract subnetworks from the global model, and assign as large a subnetwork as possible to the device for local training based on its full computing and communications capacity. Although such fixed size subnetwork assignment enables FL training over heterogeneous mobile devices, it is unaware of (i) the dynamic changes of devices' communication and computing conditions and (ii) FL training progress and its dynamic requirements of local training contributions, both of which may cause very long FL training delay. Motivated by those dynamics, in this paper, we develop a wireless and heterogeneity aware latency efficient FL (WHALE-FL) approach to accelerate FL training through adaptive subnetwork scheduling. Instead of sticking to the fixed size subnetwork, WHALE-FL introduces a novel subnetwork selection utility function to capture device and FL training dynamics, and guides the mobile device to adaptively select the subnetwork size for local training based on (a) its computing and communication capacity, (b) its dynamic computing and/or communication conditions, and (c) FL training status and its corresponding requirements for local training contributions. Our evaluation shows that, compared with peer designs, WHALE-FL effectively accelerates FL training without sacrificing learning accuracy.
