Table of Contents
Fetching ...

Towards Federated Learning at Scale: System Design

Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloe Kiddon, Jakub Konečný, Stefano Mazzocchi, H. Brendan McMahan, Timon Van Overveldt, David Petrou, Daniel Ramage, Jason Roselander

TL;DR

The paper tackles scalable Federated Learning on consumer devices, addressing privacy and coordination challenges when training on decentralized mobile data. It presents a production-efficient system built on TensorFlow with synchronous rounds, a robust Actor Model server, device attestation, and secure aggregation. The work reports on a deployed fleet reaching tens of millions of devices and demonstrates practical applications such as on-device keyboard ranking and next-word prediction with live metrics. It also discusses analytics, tooling, and open research directions including bias, parallelism beyond hundreds of devices, and broader Federated Computation opportunities.

Abstract

Federated Learning is a distributed machine learning approach which enables model training on a large corpus of decentralized data. We have built a scalable production system for Federated Learning in the domain of mobile devices, based on TensorFlow. In this paper, we describe the resulting high-level design, sketch some of the challenges and their solutions, and touch upon the open problems and future directions.

Towards Federated Learning at Scale: System Design

TL;DR

The paper tackles scalable Federated Learning on consumer devices, addressing privacy and coordination challenges when training on decentralized mobile data. It presents a production-efficient system built on TensorFlow with synchronous rounds, a robust Actor Model server, device attestation, and secure aggregation. The work reports on a deployed fleet reaching tens of millions of devices and demonstrates practical applications such as on-device keyboard ranking and next-word prediction with live metrics. It also discusses analytics, tooling, and open research directions including bias, parallelism beyond hundreds of devices, and broader Federated Computation opportunities.

Abstract

Federated Learning is a distributed machine learning approach which enables model training on a large corpus of decentralized data. We have built a scalable production system for Federated Learning in the domain of mobile devices, based on TensorFlow. In this paper, we describe the resulting high-level design, sketch some of the challenges and their solutions, and touch upon the open problems and future directions.

Paper Structure

This paper contains 26 sections, 9 figures, 1 table, 1 algorithm.

Figures (9)

  • Figure 1: Federated Learning Protocol
  • Figure 2: Device Architecture
  • Figure 3: Actors in the FL Server Architecture
  • Figure 4: Model Engineer Workflow
  • Figure 5: Round Completion Rate
  • ...and 4 more figures