Table of Contents
Fetching ...

A Field Guide to Federated Optimization

Jianyu Wang, Zachary Charles, Zheng Xu, Gauri Joshi, H. Brendan McMahan, Blaise Aguera y Arcas, Maruan Al-Shedivat, Galen Andrew, Salman Avestimehr, Katharine Daly, Deepesh Data, Suhas Diggavi, Hubert Eichner, Advait Gadhikar, Zachary Garrett, Antonious M. Girgis, Filip Hanzely, Andrew Hard, Chaoyang He, Samuel Horvath, Zhouyuan Huo, Alex Ingerman, Martin Jaggi, Tara Javidi, Peter Kairouz, Satyen Kale, Sai Praneeth Karimireddy, Jakub Konecny, Sanmi Koyejo, Tian Li, Luyang Liu, Mehryar Mohri, Hang Qi, Sashank J. Reddi, Peter Richtarik, Karan Singhal, Virginia Smith, Mahdi Soltanolkotabi, Weikang Song, Ananda Theertha Suresh, Sebastian U. Stich, Ameet Talwalkar, Hongyi Wang, Blake Woodworth, Shanshan Wu, Felix X. Yu, Honglin Yuan, Manzil Zaheer, Mi Zhang, Tong Zhang, Chunxiang Zheng, Chen Zhu, Wennan Zhu

TL;DR

This field guide articulates practical principles for formulating, evaluating, and implementing federated optimization under stringent privacy, communication, and system constraints. It emphasizes concrete algorithmic design (e.g., FedAvg generalizations, FedOpt, SCAFFOLD, FedProx), realistic simulations, and system-aware evaluation to bridge theory and practice. The paper catalogs problem formulations, representative techniques, evaluation methodologies, and deployment considerations, while highlighting open gaps between theory and practice. Its integrative perspective aims to empower researchers and practitioners to craft robust, scalable, and privacy-preserving federated optimization methods for diverse applications.

Abstract

Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data, motivated by and designed for privacy protection. The distributed learning process can be formulated as solving federated optimization problems, which emphasize communication efficiency, data heterogeneity, compatibility with privacy and system requirements, and other constraints that are not primary considerations in other problem settings. This paper provides recommendations and guidelines on formulating, designing, evaluating and analyzing federated optimization algorithms through concrete examples and practical implementation, with a focus on conducting effective simulations to infer real-world performance. The goal of this work is not to survey the current literature, but to inspire researchers and practitioners to design federated learning algorithms that can be used in various practical applications.

A Field Guide to Federated Optimization

TL;DR

This field guide articulates practical principles for formulating, evaluating, and implementing federated optimization under stringent privacy, communication, and system constraints. It emphasizes concrete algorithmic design (e.g., FedAvg generalizations, FedOpt, SCAFFOLD, FedProx), realistic simulations, and system-aware evaluation to bridge theory and practice. The paper catalogs problem formulations, representative techniques, evaluation methodologies, and deployment considerations, while highlighting open gaps between theory and practice. Its integrative perspective aims to empower researchers and practitioners to craft robust, scalable, and privacy-preserving federated optimization methods for diverse applications.

Abstract

Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data, motivated by and designed for privacy protection. The distributed learning process can be formulated as solving federated optimization problems, which emphasize communication efficiency, data heterogeneity, compatibility with privacy and system requirements, and other constraints that are not primary considerations in other problem settings. This paper provides recommendations and guidelines on formulating, designing, evaluating and analyzing federated optimization algorithms through concrete examples and practical implementation, with a focus on conducting effective simulations to infer real-world performance. The goal of this work is not to survey the current literature, but to inspire researchers and practitioners to design federated learning algorithms that can be used in various practical applications.

Paper Structure

This paper contains 151 sections, 3 theorems, 34 equations, 22 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

Assuming the client learning rate satisfies $\eta \leq \frac{1}{4L}$, then where $\mathcal{F}^{(t,0)}$ is the $\sigma$-field representing all the historical information up to the start of the $t$-th round.

Figures (22)

  • Figure 1: Test accuracy on Stack Overflow for Algorithms A, B, and C. The best and second-best performing client and server learning rate combinations are selected for each algorithm based on validation set accuracy (see Figure \ref{['fig:stackoverflow_tune_lr']}). The dark lines indicate the test set accuracy for the best-performing combination (given in the legend), with shading indicating the gap between the best and second-best performing runs. This shaded gap provides an indication of the relatively sensitivity of the different algorithms to the resolution of the hyperparameter tuning grid; here we see that Algorithms B and C both perform better and may be easier to tune.
  • Figure 2: Validation accuracy on Stack Overflow at the last training round (2000), for various client and server learning rates. Results for Algorithms A, B, and C are given in the left, middle, and right plots, respectively. The best and second-best $(\eta_s, \eta)$ combinations are used in Figure \ref{['fig:stackoverflow_compare_tuning']}.
  • Figure 3: Test accuracy on GLD-23k for a total of 1000, 5000, and 20,000 communication rounds (left, middle, and right, respectively). The best and second-best performing client and server learning rate combinations are selected for each algorithm based on the test set accuracy after the final communication round. The dark lines indicate the test set accuracy for the best-performing combination (given in the legend), with shading indicating the gap between the best and second-best performing runs.
  • Figure 4: Test accuracy on GLD-23k for Algorithms A (left) and C (right) for various numbers of local epochs per round $E$, versus the number of communication rounds. We set $\eta = 0.1, \eta_s = 1.0$ for Algorithm A, and $\eta = 0.01$, $\eta_s = 10^{-5/2}$ for Algorithm C.
  • Figure 5: Test accuracy on GLD-23k for Algorithms A (left) and C (right) for various numbers of local epochs per round, versus the total number of examples processed by all clients. We set $\eta = 0.1, \eta_s = 1.0$ for Algorithm A, and $\eta = 0.01$, $\eta_s = 10^{-5/2}$ for Algorithm C. These learning rates were chosen as they performed well for $E = 2$. While they also performed near-optimally for other values of $E$, tuning the learning rates jointly with $E$ may produce slightly better results.
  • ...and 17 more figures

Theorems & Definitions (7)

  • Remark 1: Communication Efficiency
  • Lemma 1: Per Round Progress
  • Lemma 2: Bounded Client Drift
  • Theorem 1: Convergence Rate for Convex Local Functions
  • Remark 2
  • proof : Proof of \ref{['lem:1']}
  • proof : Proof of \ref{['lem:2']}