Table of Contents
Fetching ...

Federated Learning: Challenges, Methods, and Future Directions

Tian Li, Anit Kumar Sahu, Ameet Talwalkar, Virginia Smith

TL;DR

Federated learning enables training across edge devices with data staying local, addressing privacy and bandwidth concerns. The paper surveys the core challenges—communication efficiency, systems and statistical heterogeneity, and privacy—and reviews methods such as FedAvg, local updating, compression, asynchronous schemes, and personalization approaches, along with privacy techniques like differential privacy and secure computation. It highlights convergence issues under non-IID data, proposes modeling approaches (multi-task/meta-learning) and algorithms to guarantee stability, and discusses production considerations. The discussion emphasizes practical impact for on-device learning and edge deployments and outlines open research directions and benchmarking needs.

Abstract

Federated learning involves training statistical models over remote devices or siloed data centers, such as mobile phones or hospitals, while keeping data localized. Training in heterogeneous and potentially massive networks introduces novel challenges that require a fundamental departure from standard approaches for large-scale machine learning, distributed optimization, and privacy-preserving data analysis. In this article, we discuss the unique characteristics and challenges of federated learning, provide a broad overview of current approaches, and outline several directions of future work that are relevant to a wide range of research communities.

Federated Learning: Challenges, Methods, and Future Directions

TL;DR

Federated learning enables training across edge devices with data staying local, addressing privacy and bandwidth concerns. The paper surveys the core challenges—communication efficiency, systems and statistical heterogeneity, and privacy—and reviews methods such as FedAvg, local updating, compression, asynchronous schemes, and personalization approaches, along with privacy techniques like differential privacy and secure computation. It highlights convergence issues under non-IID data, proposes modeling approaches (multi-task/meta-learning) and algorithms to guarantee stability, and discusses production considerations. The discussion emphasizes practical impact for on-device learning and edge deployments and outlines open research directions and benchmarking needs.

Abstract

Federated learning involves training statistical models over remote devices or siloed data centers, such as mobile phones or hospitals, while keeping data localized. Training in heterogeneous and potentially massive networks introduces novel challenges that require a fundamental departure from standard approaches for large-scale machine learning, distributed optimization, and privacy-preserving data analysis. In this article, we discuss the unique characteristics and challenges of federated learning, provide a broad overview of current approaches, and outline several directions of future work that are relevant to a wide range of research communities.

Paper Structure

This paper contains 21 sections, 1 equation, 6 figures.

Figures (6)

  • Figure 1: An example application of federated learning for the task of next-word prediction on mobile phones. To preserve the privacy of the text data and to reduce strain on the network, we seek to train a predictor in a distributed fashion, rather than sending the raw data to a central server. In this setup, remote devices communicate with a central server periodically to learn a global model. At each communication round, a subset of selected phones performs local training on their non-identically-distributed user data, and sends these local updates to the server. After incorporating the updates, the server then sends back the new global model to another subset of devices. This iterative training process continues across the network until convergence is reached or some stopping criterion is met.
  • Figure 2: Left: Distributed (mini-batch) SGD. Each device, $k$, locally computes gradients from a mini-batch of data points to approximate $\nabla F_k(w)$, and the aggregated mini-batch updates are applied on the server. Right: Local updating schemes. Each device immediately applies local updates, e.g., gradients, after they are computed and a server performs a global aggregation after a variable number of local updates. Local-updating schemes can reduce communication by performing additional work locally.
  • Figure 3: Centralized vs. decentralized topologies. In the typical federated learning setting and as a focus of this article, we assume a star network (left) where a server connects with all remote devices. Decentralized topologies (right) are a potential alternative when communication to the server becomes a bottleneck.
  • Figure 4: Systems heterogeneity in federated learning. Devices may vary in terms of network connection, power, and hardware. Moreover, some of the devices may drop at any time during training. Therefore, federated training methods must tolerate heterogeneous systems environments and low participation of devices, i.e., they must allow for only a small subset of devices to be active at each round.
  • Figure 5: Different modeling approaches in federated networks. Depending on properties of the data, network, and application of interest, one may choose to (a) learn separate models for each device, (b) fit a single global model to all devices, or (c) learn related but distinct models in the network.
  • ...and 1 more figures