FedFetch: Faster Federated Learning with Adaptive Downstream Prefetching
Qifan Yan, Andrew Liu, Shiqi He, Mathias Lécuyer, Ivan Beschastnikh
TL;DR
FedFetch addresses the downstream communication bottleneck in cross-device Federated Learning by introducing Prepare and Prefetch phases that prefetch model states for future training rounds. The method adaptively schedules per-client prefetches based on bandwidth profiles and a time-limit constraint, reducing client-side model staleness and downstream download load. Empirical results show FedFetch yields about $1.26\times$ end-to-end speedup and $4.49\times$ faster fetch times across several compression techniques, with a moderate $\sim12\%$ bandwidth overhead and strong compatibility with existing sampling and compression methods. This approach enhances real-world FL deployments by mitigating straggler effects and improving downstream efficiency without sacrificing convergence or accuracy.
Abstract
Federated learning (FL) is a machine learning paradigm that facilitates massively distributed model training with end-user data on edge devices directed by a central server. However, the large number of heterogeneous clients in FL deployments leads to a communication bottleneck between the server and the clients. This bottleneck is made worse by straggling clients, any one of which will further slow down training. To tackle these challenges, researchers have proposed techniques like client sampling and update compression. These techniques work well in isolation but combine poorly in the downstream, server-to-client direction. This is because unselected clients have outdated local model states and need to synchronize these states with the server first. We introduce FedFetch, a strategy to mitigate the download time overhead caused by combining client sampling and compression techniques. FedFetch achieves this with an efficient prefetch schedule for clients to prefetch model states multiple rounds before a stated training round. We empirically show that adding FedFetch to communication efficient FL techniques reduces end-to-end training time by 1.26$\times$ and download time by 4.49$\times$ across compression techniques with heterogeneous client settings. Our implementation is available at https://github.com/DistributedML/FedFetch
