FedEx: Expediting Federated Learning over Heterogeneous Mobile Devices by Overlapping and Participant Selection
Jiaxiang Geng, Boyu Li, Xiaoqi Qin, Yixuan Li, Liang Li, Yanzhao Hou, Miao Pan
TL;DR
FedEx tackles the high FL training latency on heterogeneous mobile devices by integrating overlapping computation and communication with a staleness ceiling and a novel overlapping-aware participation strategy. It defines a tight, latency-aware update protocol, a PS utility that factors in overlap benefits, and a trigger based on model similarity to avoid early drift. The approach yields substantial latency reductions across multiple tasks while keeping memory usage in check, outperforming state-of-the-art PS methods and overlapping baselines in heterogeneous settings. This work provides a practical framework for deploying faster FL on real-world, device-diverse ecosystems and releases an open-source implementation to foster adoption and further improvements.
Abstract
Training latency is critical for the success of numerous intrigued applications ignited by federated learning (FL) over heterogeneous mobile devices. By revolutionarily overlapping local gradient transmission with continuous local computing, FL can remarkably reduce its training latency over homogeneous clients, yet encounter severe model staleness, model drifts, memory cost and straggler issues in heterogeneous environments. To unleash the full potential of overlapping, we propose, FedEx, a novel \underline{fed}erated learning approach to \underline{ex}pedite FL training over mobile devices under data, computing and wireless heterogeneity. FedEx redefines the overlapping procedure with staleness ceilings to constrain memory consumption and make overlapping compatible with participation selection (PS) designs. Then, FedEx characterizes the PS utility function by considering the latency reduced by overlapping, and provides a holistic PS solution to address the straggler issue. FedEx also introduces a simple but effective metric to trigger overlapping, in order to avoid model drifts. Experimental results show that compared with its peer designs, FedEx demonstrates substantial reductions in FL training latency over heterogeneous mobile devices with limited memory cost.
