Table of Contents
Fetching ...

FedEx: Expediting Federated Learning over Heterogeneous Mobile Devices by Overlapping and Participant Selection

Jiaxiang Geng, Boyu Li, Xiaoqi Qin, Yixuan Li, Liang Li, Yanzhao Hou, Miao Pan

TL;DR

FedEx tackles the high FL training latency on heterogeneous mobile devices by integrating overlapping computation and communication with a staleness ceiling and a novel overlapping-aware participation strategy. It defines a tight, latency-aware update protocol, a PS utility that factors in overlap benefits, and a trigger based on model similarity to avoid early drift. The approach yields substantial latency reductions across multiple tasks while keeping memory usage in check, outperforming state-of-the-art PS methods and overlapping baselines in heterogeneous settings. This work provides a practical framework for deploying faster FL on real-world, device-diverse ecosystems and releases an open-source implementation to foster adoption and further improvements.

Abstract

Training latency is critical for the success of numerous intrigued applications ignited by federated learning (FL) over heterogeneous mobile devices. By revolutionarily overlapping local gradient transmission with continuous local computing, FL can remarkably reduce its training latency over homogeneous clients, yet encounter severe model staleness, model drifts, memory cost and straggler issues in heterogeneous environments. To unleash the full potential of overlapping, we propose, FedEx, a novel \underline{fed}erated learning approach to \underline{ex}pedite FL training over mobile devices under data, computing and wireless heterogeneity. FedEx redefines the overlapping procedure with staleness ceilings to constrain memory consumption and make overlapping compatible with participation selection (PS) designs. Then, FedEx characterizes the PS utility function by considering the latency reduced by overlapping, and provides a holistic PS solution to address the straggler issue. FedEx also introduces a simple but effective metric to trigger overlapping, in order to avoid model drifts. Experimental results show that compared with its peer designs, FedEx demonstrates substantial reductions in FL training latency over heterogeneous mobile devices with limited memory cost.

FedEx: Expediting Federated Learning over Heterogeneous Mobile Devices by Overlapping and Participant Selection

TL;DR

FedEx tackles the high FL training latency on heterogeneous mobile devices by integrating overlapping computation and communication with a staleness ceiling and a novel overlapping-aware participation strategy. It defines a tight, latency-aware update protocol, a PS utility that factors in overlap benefits, and a trigger based on model similarity to avoid early drift. The approach yields substantial latency reductions across multiple tasks while keeping memory usage in check, outperforming state-of-the-art PS methods and overlapping baselines in heterogeneous settings. This work provides a practical framework for deploying faster FL on real-world, device-diverse ecosystems and releases an open-source implementation to foster adoption and further improvements.

Abstract

Training latency is critical for the success of numerous intrigued applications ignited by federated learning (FL) over heterogeneous mobile devices. By revolutionarily overlapping local gradient transmission with continuous local computing, FL can remarkably reduce its training latency over homogeneous clients, yet encounter severe model staleness, model drifts, memory cost and straggler issues in heterogeneous environments. To unleash the full potential of overlapping, we propose, FedEx, a novel \underline{fed}erated learning approach to \underline{ex}pedite FL training over mobile devices under data, computing and wireless heterogeneity. FedEx redefines the overlapping procedure with staleness ceilings to constrain memory consumption and make overlapping compatible with participation selection (PS) designs. Then, FedEx characterizes the PS utility function by considering the latency reduced by overlapping, and provides a holistic PS solution to address the straggler issue. FedEx also introduces a simple but effective metric to trigger overlapping, in order to avoid model drifts. Experimental results show that compared with its peer designs, FedEx demonstrates substantial reductions in FL training latency over heterogeneous mobile devices with limited memory cost.
Paper Structure (37 sections, 27 equations, 10 figures, 7 tables, 1 algorithm)

This paper contains 37 sections, 27 equations, 10 figures, 7 tables, 1 algorithm.

Figures (10)

  • Figure 1: Overlapping deficiencies due to heterogeneity.
  • Figure 2: Performance comparison: homo. vs heter. (CNN@MNIST, non-i.i.d. data).
  • Figure 3: The sketch of FedEx procedure and testbed.
  • Figure 4: Performance comparison in terms of testing accuracy and training latency under different learning tasks.
  • Figure 5: Staleness/memory vs FL training time.
  • ...and 5 more figures