An Operator Splitting View of Federated Learning
Saber Malekmohammadi, Kiarash Shaloudegi, Zeou Hu, Yaoliang Yu
TL;DR
This paper recasts federated learning as an operator-splitting problem, unifying core algorithms under a single framework and clarifying how step-size and local updates influence convergence. It shows FedAvg corresponds to forward-backward splitting, FedProx to backward-backward splitting, FedSplit to Peaceman-Rachford, FedPi to Douglas-Rachford, and FedRP to Reflection-Projection, revealing new algorithmic variants and deeper connections. The authors also introduce a practical acceleration path via Anderson acceleration that speeds up convergence without adding communication overhead, and provide extensive convex and nonconvex experiments to validate the theory. The work offers a standardized, extensible view of FL algorithms, enabling streamlined implementation, comparison, and scalable acceleration across heterogeneous devices and networks.
Abstract
Over the past few years, the federated learning ($\texttt{FL}$) community has witnessed a proliferation of new $\texttt{FL}$ algorithms. However, our understating of the theory of $\texttt{FL}$ is still fragmented, and a thorough, formal comparison of these algorithms remains elusive. Motivated by this gap, we show that many of the existing $\texttt{FL}$ algorithms can be understood from an operator splitting point of view. This unification allows us to compare different algorithms with ease, to refine previous convergence results and to uncover new algorithmic variants. In particular, our analysis reveals the vital role played by the step size in $\texttt{FL}$ algorithms. The unification also leads to a streamlined and economic way to accelerate $\texttt{FL}$ algorithms, without incurring any communication overhead. We perform numerical experiments on both convex and nonconvex models to validate our findings.
