Table of Contents
Fetching ...

FedFetch: Faster Federated Learning with Adaptive Downstream Prefetching

Qifan Yan, Andrew Liu, Shiqi He, Mathias Lécuyer, Ivan Beschastnikh

TL;DR

FedFetch addresses the downstream communication bottleneck in cross-device Federated Learning by introducing Prepare and Prefetch phases that prefetch model states for future training rounds. The method adaptively schedules per-client prefetches based on bandwidth profiles and a time-limit constraint, reducing client-side model staleness and downstream download load. Empirical results show FedFetch yields about $1.26\times$ end-to-end speedup and $4.49\times$ faster fetch times across several compression techniques, with a moderate $\sim12\%$ bandwidth overhead and strong compatibility with existing sampling and compression methods. This approach enhances real-world FL deployments by mitigating straggler effects and improving downstream efficiency without sacrificing convergence or accuracy.

Abstract

Federated learning (FL) is a machine learning paradigm that facilitates massively distributed model training with end-user data on edge devices directed by a central server. However, the large number of heterogeneous clients in FL deployments leads to a communication bottleneck between the server and the clients. This bottleneck is made worse by straggling clients, any one of which will further slow down training. To tackle these challenges, researchers have proposed techniques like client sampling and update compression. These techniques work well in isolation but combine poorly in the downstream, server-to-client direction. This is because unselected clients have outdated local model states and need to synchronize these states with the server first. We introduce FedFetch, a strategy to mitigate the download time overhead caused by combining client sampling and compression techniques. FedFetch achieves this with an efficient prefetch schedule for clients to prefetch model states multiple rounds before a stated training round. We empirically show that adding FedFetch to communication efficient FL techniques reduces end-to-end training time by 1.26$\times$ and download time by 4.49$\times$ across compression techniques with heterogeneous client settings. Our implementation is available at https://github.com/DistributedML/FedFetch

FedFetch: Faster Federated Learning with Adaptive Downstream Prefetching

TL;DR

FedFetch addresses the downstream communication bottleneck in cross-device Federated Learning by introducing Prepare and Prefetch phases that prefetch model states for future training rounds. The method adaptively schedules per-client prefetches based on bandwidth profiles and a time-limit constraint, reducing client-side model staleness and downstream download load. Empirical results show FedFetch yields about end-to-end speedup and faster fetch times across several compression techniques, with a moderate bandwidth overhead and strong compatibility with existing sampling and compression methods. This approach enhances real-world FL deployments by mitigating straggler effects and improving downstream efficiency without sacrificing convergence or accuracy.

Abstract

Federated learning (FL) is a machine learning paradigm that facilitates massively distributed model training with end-user data on edge devices directed by a central server. However, the large number of heterogeneous clients in FL deployments leads to a communication bottleneck between the server and the clients. This bottleneck is made worse by straggling clients, any one of which will further slow down training. To tackle these challenges, researchers have proposed techniques like client sampling and update compression. These techniques work well in isolation but combine poorly in the downstream, server-to-client direction. This is because unselected clients have outdated local model states and need to synchronize these states with the server first. We introduce FedFetch, a strategy to mitigate the download time overhead caused by combining client sampling and compression techniques. FedFetch achieves this with an efficient prefetch schedule for clients to prefetch model states multiple rounds before a stated training round. We empirically show that adding FedFetch to communication efficient FL techniques reduces end-to-end training time by 1.26 and download time by 4.49 across compression techniques with heterogeneous client settings. Our implementation is available at https://github.com/DistributedML/FedFetch

Paper Structure

This paper contains 31 sections, 4 equations, 10 figures, 3 tables, 2 algorithms.

Figures (10)

  • Figure 1: Cross-device FL Characteristics.
  • Figure 2: Effect of combining client sampling and compression.
  • Figure 3: FedFetch Design. The goal of FedFetch is to minimize the amount of time clients spend on model download during their Train phase ("Fetch" in orange in the diagram). FedFetch introduces two new phases: Prepare and Prefetch. During Prepare, clients are presampled by the server and provided with a prefetch schedule. During Prefetch, each client prefetches model state ("Prefetch" in green in the diagram) from the server before their Train phase starts.
  • Figure 4: An example of a prefetch process for six clients and $R=3$. The blocks represent what each client is currently prefetching (in green) or fetching (in orange) from the server.
  • Figure 5: Bandwidth usage of select FL techniques with and without FedFetch. Techniques ending with "+FF" apply FedFetch. Each bar is divided into Fetch, Up(load), and Prefetch.
  • ...and 5 more figures