Federated Learning: A Stochastic Approximation Approach
Srihari P, Anik Kumar Paul, Bharath Bhikkaji
TL;DR
The paper analyzes federated learning through a stochastic-approximation lens, introducing client-specific tapering step sizes that yield an ODE trajectory for the global model with forcing given by the weighted sum of client gradients. Convergence is established almost surely, with client influence governed by the asymptotic step-size ratios $p^{(i)}$. This framework generalizes prior constant-step analyses by accommodating heterogeneous, non-identically distributed data and varying local update rates, and it provides practical guidance for weighting clients to favor rare data. Numerical experiments on federated linear regression and image classification corroborate the theory, showing robust convergence and highlighting how step-size choices and aggregation frequency affect learning dynamics. The work offers a principled mechanism to control client influence in FL and opens avenues for asynchronous extensions and nonstationary data settings.
Abstract
This paper considers the Federated learning (FL) in a stochastic approximation (SA) framework. Here, each client $i$ trains a local model using its dataset $\mathcal{D}^{(i)}$ and periodically transmits the model parameters $w^{(i)}_n$ to a central server, where they are aggregated into a global model parameter $\bar{w}_n$ and sent back. The clients continue their training by re-initializing their local models with the global model parameters. Prior works typically assumed constant (and often identical) step sizes (learning rates) across clients for model training. As a consequence the aggregated model converges only in expectation. In this work, client-specific tapering step sizes $a^{(i)}_n$ are used. The global model is shown to track an ODE with a forcing function equal to the weighted sum of the negative gradients of the individual clients. The weights being the limiting ratios $p^{(i)}=\lim_{n \to \infty} \frac{a^{(i)}_n}{a^{(1)}_n}$ of the step sizes, where $a^{(1)}_n \geq a^{(i)}_n, \forall n$. Unlike the constant step sizes, the convergence here is with probability one. In this framework, the clients with the larger $p^{(i)}$ exert a greater influence on the global model than those with smaller $p^{(i)}$, which can be used to favor clients that have rare and uncommon data. Numerical experiments were conducted to validate the convergence and demonstrate the choice of step-sizes for regulating the influence of the clients.
