Table of Contents
Fetching ...

Federated Learning: A Stochastic Approximation Approach

Srihari P, Anik Kumar Paul, Bharath Bhikkaji

TL;DR

The paper analyzes federated learning through a stochastic-approximation lens, introducing client-specific tapering step sizes that yield an ODE trajectory for the global model with forcing given by the weighted sum of client gradients. Convergence is established almost surely, with client influence governed by the asymptotic step-size ratios $p^{(i)}$. This framework generalizes prior constant-step analyses by accommodating heterogeneous, non-identically distributed data and varying local update rates, and it provides practical guidance for weighting clients to favor rare data. Numerical experiments on federated linear regression and image classification corroborate the theory, showing robust convergence and highlighting how step-size choices and aggregation frequency affect learning dynamics. The work offers a principled mechanism to control client influence in FL and opens avenues for asynchronous extensions and nonstationary data settings.

Abstract

This paper considers the Federated learning (FL) in a stochastic approximation (SA) framework. Here, each client $i$ trains a local model using its dataset $\mathcal{D}^{(i)}$ and periodically transmits the model parameters $w^{(i)}_n$ to a central server, where they are aggregated into a global model parameter $\bar{w}_n$ and sent back. The clients continue their training by re-initializing their local models with the global model parameters. Prior works typically assumed constant (and often identical) step sizes (learning rates) across clients for model training. As a consequence the aggregated model converges only in expectation. In this work, client-specific tapering step sizes $a^{(i)}_n$ are used. The global model is shown to track an ODE with a forcing function equal to the weighted sum of the negative gradients of the individual clients. The weights being the limiting ratios $p^{(i)}=\lim_{n \to \infty} \frac{a^{(i)}_n}{a^{(1)}_n}$ of the step sizes, where $a^{(1)}_n \geq a^{(i)}_n, \forall n$. Unlike the constant step sizes, the convergence here is with probability one. In this framework, the clients with the larger $p^{(i)}$ exert a greater influence on the global model than those with smaller $p^{(i)}$, which can be used to favor clients that have rare and uncommon data. Numerical experiments were conducted to validate the convergence and demonstrate the choice of step-sizes for regulating the influence of the clients.

Federated Learning: A Stochastic Approximation Approach

TL;DR

The paper analyzes federated learning through a stochastic-approximation lens, introducing client-specific tapering step sizes that yield an ODE trajectory for the global model with forcing given by the weighted sum of client gradients. Convergence is established almost surely, with client influence governed by the asymptotic step-size ratios . This framework generalizes prior constant-step analyses by accommodating heterogeneous, non-identically distributed data and varying local update rates, and it provides practical guidance for weighting clients to favor rare data. Numerical experiments on federated linear regression and image classification corroborate the theory, showing robust convergence and highlighting how step-size choices and aggregation frequency affect learning dynamics. The work offers a principled mechanism to control client influence in FL and opens avenues for asynchronous extensions and nonstationary data settings.

Abstract

This paper considers the Federated learning (FL) in a stochastic approximation (SA) framework. Here, each client trains a local model using its dataset and periodically transmits the model parameters to a central server, where they are aggregated into a global model parameter and sent back. The clients continue their training by re-initializing their local models with the global model parameters. Prior works typically assumed constant (and often identical) step sizes (learning rates) across clients for model training. As a consequence the aggregated model converges only in expectation. In this work, client-specific tapering step sizes are used. The global model is shown to track an ODE with a forcing function equal to the weighted sum of the negative gradients of the individual clients. The weights being the limiting ratios of the step sizes, where . Unlike the constant step sizes, the convergence here is with probability one. In this framework, the clients with the larger exert a greater influence on the global model than those with smaller , which can be used to favor clients that have rare and uncommon data. Numerical experiments were conducted to validate the convergence and demonstrate the choice of step-sizes for regulating the influence of the clients.
Paper Structure (24 sections, 45 equations, 22 figures, 1 table)

This paper contains 24 sections, 45 equations, 22 figures, 1 table.

Figures (22)

  • Figure 1: Federated Learning setup with $L$ clients.
  • Figure 2: Equal Step Sizes. The parameters converge to the global minima. The aggregated gradient norm decays to zero and individual gradient norms are non-zero.
  • Figure 3: Unequal Step Sizes with Finite Influence. Only two clients influence the model. The combined gradient norms of client 1 and client 2 goes to zero while individual gradient norms are non-zero.
  • Figure 4: Unequal Step Sizes with Vanishing Influence. Only Client 1 has persistent influence. The global model converges to Client 1's parameters and only Client 1's gradient decay to zero.
  • Figure 5: Effect of $\delta$ on parameter convergence and gradient norm decay. Smaller $\delta$ has faster decay but less stable convergence.
  • ...and 17 more figures