Table of Contents
Fetching ...

Continuous-Time Analysis of Federated Averaging

Tom Overman, Diego Klabjan

TL;DR

This work introduces a continuous-time stochastic differential equation (SDE) model for the server weights in FedAvg, enabling compact convergence and generalization analysis beyond discrete-time results. It proves convergence to stationary points for general non-convex losses and global convergence for weakly quasi-convex losses, under realistic smoothness and variance assumptions, with explicit rate characterizations for common learning-rate schedules. The paper also establishes conditions under which server updates are normal even with non-IID data and local updates, and it analyzes a quadratic-loss setting to reveal how FedAvg hyperparameters affect the trade-off between optimality and generalization. Overall, the continuous-time formulation provides a versatile framework to study FedAvg behavior, shedding light on convergence, normality of updates, and generalization properties with potential to extend to other FL algorithms.

Abstract

Federated averaging (FedAvg) is a popular algorithm for horizontal federated learning (FL), where samples are gathered across different clients and are not shared with each other or a central server. Extensive convergence analysis of FedAvg exists for the discrete iteration setting, guaranteeing convergence for a range of loss functions and varying levels of data heterogeneity. We extend this analysis to the continuous-time setting where the global weights evolve according to a multivariate stochastic differential equation (SDE), which is the first time FedAvg has been studied from the continuous-time perspective. We use techniques from stochastic processes to establish convergence guarantees under different loss functions, some of which are more general than existing work in the discrete setting. We also provide conditions for which FedAvg updates to the server weights can be approximated as normal random variables. Finally, we use the continuous-time formulation to reveal generalization properties of FedAvg.

Continuous-Time Analysis of Federated Averaging

TL;DR

This work introduces a continuous-time stochastic differential equation (SDE) model for the server weights in FedAvg, enabling compact convergence and generalization analysis beyond discrete-time results. It proves convergence to stationary points for general non-convex losses and global convergence for weakly quasi-convex losses, under realistic smoothness and variance assumptions, with explicit rate characterizations for common learning-rate schedules. The paper also establishes conditions under which server updates are normal even with non-IID data and local updates, and it analyzes a quadratic-loss setting to reveal how FedAvg hyperparameters affect the trade-off between optimality and generalization. Overall, the continuous-time formulation provides a versatile framework to study FedAvg behavior, shedding light on convergence, normality of updates, and generalization properties with potential to extend to other FL algorithms.

Abstract

Federated averaging (FedAvg) is a popular algorithm for horizontal federated learning (FL), where samples are gathered across different clients and are not shared with each other or a central server. Extensive convergence analysis of FedAvg exists for the discrete iteration setting, guaranteeing convergence for a range of loss functions and varying levels of data heterogeneity. We extend this analysis to the continuous-time setting where the global weights evolve according to a multivariate stochastic differential equation (SDE), which is the first time FedAvg has been studied from the continuous-time perspective. We use techniques from stochastic processes to establish convergence guarantees under different loss functions, some of which are more general than existing work in the discrete setting. We also provide conditions for which FedAvg updates to the server weights can be approximated as normal random variables. Finally, we use the continuous-time formulation to reveal generalization properties of FedAvg.

Paper Structure

This paper contains 21 sections, 12 theorems, 106 equations.

Key Result

Theorem 4.5

We assume Assumptions normality, smooth, same_client_learning_rates, bounded-variance, and constant_variance are met, and the server learning rate $\eta_0(t)=1$. For a random time point $\tilde{t} \in [0,t]$ that follows the distribution $\frac{\eta(\tilde{t})}{\int_{0}^{t}\eta(s)ds}$, we have where $C_1=\frac{E^2L\mu \sum_{k=1}^Qp_k[L+\sqrt{\text{Tr}(\Sigma_k)}]}{2}$, $\varphi(t) = \int_{0}^t \e

Theorems & Definitions (31)

  • Theorem 4.5
  • proof
  • Corollary 4.6
  • proof
  • Corollary 4.7
  • proof
  • Theorem 4.9
  • proof
  • Corollary 4.10
  • proof
  • ...and 21 more