Table of Contents
Fetching ...

A Unified Linear Speedup Analysis of Federated Averaging and Nesterov FedAvg

Zhaonan Qu, Kaixiang Lin, Zhaojian Li, Jiayu Zhou, Zhengyuan Zhou

TL;DR

This work provides a unified linear-speedup analysis for Federated Averaging (FedAvg) and its Nesterov-accelerated variant across strongly convex, convex, and overparameterized settings under data and system heterogeneity. By carefully bounding one-step progress and the divergence across clients, it establishes sharp rates showing linear speedup in the number of active devices $K$ and local steps $T$, including partial participation. It also delivers the first linear-speedup guarantees for Nesterov FedAvg in convex settings and shows geometric convergence in the overparameterized regime, with explicit rates that depend on problem conditioning and participation. Complemented by numerical experiments, these results provide practical guidance on choosing local steps and participation to balance communication costs with convergence speed in real-world federated systems.

Abstract

Federated learning (FL) learns a model jointly from a set of participating devices without sharing each other's privately held data. The characteristics of non-i.i.d. data across the network, low device participation, high communication costs, and the mandate that data remain private bring challenges in understanding the convergence of FL algorithms, particularly regarding how convergence scales with the number of participating devices. In this paper, we focus on Federated Averaging (FedAvg), one of the most popular and effective FL algorithms in use today, as well as its Nesterov accelerated variant, and conduct a systematic study of how their convergence scale with the number of participating devices under non-i.i.d. data and partial participation in convex settings. We provide a unified analysis that establishes convergence guarantees for FedAvg under strongly convex, convex, and overparameterized strongly convex problems. We show that FedAvg enjoys linear speedup in each case, although with different convergence rates and communication efficiencies. For strongly convex and convex problems, we also characterize the corresponding convergence rates for the Nesterov accelerated FedAvg algorithm, which are the first linear speedup guarantees for momentum variants of FedAvg in convex settings. Empirical studies of the algorithms in various settings have supported our theoretical results.

A Unified Linear Speedup Analysis of Federated Averaging and Nesterov FedAvg

TL;DR

This work provides a unified linear-speedup analysis for Federated Averaging (FedAvg) and its Nesterov-accelerated variant across strongly convex, convex, and overparameterized settings under data and system heterogeneity. By carefully bounding one-step progress and the divergence across clients, it establishes sharp rates showing linear speedup in the number of active devices and local steps , including partial participation. It also delivers the first linear-speedup guarantees for Nesterov FedAvg in convex settings and shows geometric convergence in the overparameterized regime, with explicit rates that depend on problem conditioning and participation. Complemented by numerical experiments, these results provide practical guidance on choosing local steps and participation to balance communication costs with convergence speed in real-world federated systems.

Abstract

Federated learning (FL) learns a model jointly from a set of participating devices without sharing each other's privately held data. The characteristics of non-i.i.d. data across the network, low device participation, high communication costs, and the mandate that data remain private bring challenges in understanding the convergence of FL algorithms, particularly regarding how convergence scales with the number of participating devices. In this paper, we focus on Federated Averaging (FedAvg), one of the most popular and effective FL algorithms in use today, as well as its Nesterov accelerated variant, and conduct a systematic study of how their convergence scale with the number of participating devices under non-i.i.d. data and partial participation in convex settings. We provide a unified analysis that establishes convergence guarantees for FedAvg under strongly convex, convex, and overparameterized strongly convex problems. We show that FedAvg enjoys linear speedup in each case, although with different convergence rates and communication efficiencies. For strongly convex and convex problems, we also characterize the corresponding convergence rates for the Nesterov accelerated FedAvg algorithm, which are the first linear speedup guarantees for momentum variants of FedAvg in convex settings. Empirical studies of the algorithms in various settings have supported our theoretical results.

Paper Structure

This paper contains 40 sections, 22 theorems, 181 equations, 2 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

Let $\overline{\mathbf{w}}_{T}=\sum_{k=1}^{N}p_{k}\mathbf{w}_{T}^{k}$ in FedAvg, $\nu_{\max}=\max_{k}Np_{k}$, and set decaying learning rates $\alpha_{t}=\frac{4}{\mu(\gamma+t)}$ with $\gamma=\max\{32\kappa,E\}$ and $\kappa=\frac{L}{\mu}$. Then under Assumptions ass:lsmooth to ass:subgrad2 with full and with partial device participation with at least $K$ sampled devices at each communication round

Figures (2)

  • Figure 1: The linear speedup of FedAvg in full participation, partial participation, and the linear speedup of Nesterov accelerated FedAvg, respectively. Both the x-axis and y-axis are logarithmic-scale.
  • Figure 2: The convergence of FedAvg w.r.t the number of local steps $E$.

Theorems & Definitions (36)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Theorem 6
  • Lemma 1
  • Lemma 2
  • Lemma 3: One step progress, strongly convex
  • Lemma 4: Bounding gradient variance (Lemma 2 li2019convergence)
  • ...and 26 more