Table of Contents
Fetching ...

Refined Analysis of Federated Averaging and Federated Richardson-Romberg

Paul Mangold, Alain Durmus, Aymeric Dieuleveut, Sergey Samsonov, Eric Moulines

TL;DR

The paper rethinks FedAvg by proving that its global iterates converge to a stationary distribution under constant steps and local updates, enabling precise first-order bias and variance characterizations. It separates bias into stochastic-gradient-noise and client-heterogeneity components and extends the analysis to deterministic and stochastic gradient settings. A novel Richardson-Romberg extrapolation method reduces both sources of bias without extra memory, yielding reduced communication requirements. Theory is complemented by numerical experiments on logistic regression that illustrate improved bias reduction and practical gains in homogeneous and heterogeneous data regimes. This stationary-distribution perspective offers a principled lens for designing federated optimization algorithms with controlled bias and improved efficiency.

Abstract

In this paper, we present a novel analysis of \FedAvg with constant step size, relying on the Markov property of the underlying process. We demonstrate that the global iterates of the algorithm converge to a stationary distribution and analyze its resulting bias and variance relative to the problem's solution. We provide a first-order bias expansion in both homogeneous and heterogeneous settings. Interestingly, this bias decomposes into two distinct components: one that depends solely on stochastic gradient noise and another on client heterogeneity. Finally, we introduce a new algorithm based on the Richardson-Romberg extrapolation technique to mitigate this bias.

Refined Analysis of Federated Averaging and Federated Richardson-Romberg

TL;DR

The paper rethinks FedAvg by proving that its global iterates converge to a stationary distribution under constant steps and local updates, enabling precise first-order bias and variance characterizations. It separates bias into stochastic-gradient-noise and client-heterogeneity components and extends the analysis to deterministic and stochastic gradient settings. A novel Richardson-Romberg extrapolation method reduces both sources of bias without extra memory, yielding reduced communication requirements. Theory is complemented by numerical experiments on logistic regression that illustrate improved bias reduction and practical gains in homogeneous and heterogeneous data regimes. This stationary-distribution perspective offers a principled lens for designing federated optimization algorithms with controlled bias and improved efficiency.

Abstract

In this paper, we present a novel analysis of \FedAvg with constant step size, relying on the Markov property of the underlying process. We demonstrate that the global iterates of the algorithm converge to a stationary distribution and analyze its resulting bias and variance relative to the problem's solution. We provide a first-order bias expansion in both homogeneous and heterogeneous settings. Interestingly, this bias decomposes into two distinct components: one that depends solely on stochastic gradient noise and another on client heterogeneity. Finally, we introduce a new algorithm based on the Richardson-Romberg extrapolation technique to mitigate this bias.

Paper Structure

This paper contains 39 sections, 33 theorems, 267 equations, 1 figure, 1 table, 1 algorithm.

Key Result

Proposition 1

Assume assum:local_functions. Then for all $H > 0$ and $\gamma \le 1/L$, FedAvg-D converges to a unique point $\bar{\theta}_{\textnormal{det}}^{(\gamma, H)}$ that satisfies $\dfedavgop[ \gamma,H](\bar{\theta}_{\textnormal{det}}^{(\gamma, H)}) = \bar{\theta}_{\textnormal{det}}^{(\gamma, H)}$ and $\ps

Figures (1)

  • Figure 1: Mean squared error on the synthetic noisy (first line) and on the synthetic heterogeneous dataset (second line), as a function of the number of communications, for $H \in \{10, 100\}$. In \ref{['10in', '100in', '10ih', '100ih']} (labelled Iterates), we plot the MSE for global iterates of the three methods, while in \ref{['10an', '100an', '10ah', '100ah']} (labelled Averaged), we plot the MSE for first $10$% of iterates, and then plot the MSE of the averaged iterates for the last $90$% of the iterates. We plot the average over $10$ runs, with standard deviation.

Theorems & Definitions (55)

  • Proposition 1: Stationary Point of
  • Proposition 2: Bias of
  • Theorem 1: First-Order Bias of FedAvg-D
  • Corollary 1: Convergence Rate of Deterministic
  • Proposition 3: Convergence of
  • Proposition 4: Convergence to a neighborhood of $\statdistlim{\step,\nlupdates}$
  • Theorem 2: Bias of , Quadratic Functions
  • Theorem 3: Bias of , Homogeneous
  • Theorem 4: Bias of , Heterogeneous
  • Theorem 5: Richardson-Romberg
  • ...and 45 more