SPAM: Stochastic Proximal Point Method with Momentum Variance Reduction for Non-convex Cross-Device Federated Learning
Avetik Karagulyan, Egor Shulgin, Abdurakhmon Sadiev, Peter Richtárik
TL;DR
This paper addresses cross-device federated learning with non-convex losses across billions of clients. It introduces SPAM, a framework that couples Momentum Variance Reduction on the server with a Stochastic Proximal Point method on the clients, and it extends to partial participation via SPAM-PP. The analysis establishes convergence under Hessian similarity without requiring smoothness, and shows an optimal communication-rate bound of $O(K^{-1/3})$ iterations to reach an $\varepsilon$-stationary point, with improved dependence on the Hessian similarity $\delta$ and gradient variance $\sigma$. Empirical results on a distributed ridge regression task corroborate the theory and illustrate robustness to inexact proximal computations. The work offers a flexible, state-free, locally solver-agnostic approach with significant implications for communication efficiency in large-scale cross-device FL.
Abstract
Cross-device training is a crucial subfield of federated learning, where the number of clients can reach into the billions. Standard approaches and local methods are prone to issues such as client drift and insensitivity to data similarities. We propose a novel algorithm (SPAM) for cross-device federated learning with non-convex losses, which solves both issues. We provide sharp analysis under second-order (Hessian) similarity, a condition satisfied by a variety of machine learning problems in practice. Additionally, we extend our results to the partial participation setting, where a cohort of selected clients communicate with the server at each communication round. Our method is the first in its kind, that does not require the smoothness of the objective and provably benefits from clients having similar data.
