Momentum Approximation in Asynchronous Private Federated Learning
Tao Yu, Congzheng Song, Jianyu Wang, Mona Chitnis
TL;DR
This work tackles the challenge of combining momentum with asynchronous federated learning by identifying an implicit momentum bias caused by stale updates in AsyncFL. It introduces momentum approximation (MA), an online least-squares weighting scheme that makes the effective history weights approximate the synchronous momentum, thereby recovering acceleration without extensive hyperparameter tuning. The authors demonstrate, on large-scale benchmarks with and without differential privacy, that MA and its light-weight variant achieve substantial convergence speedups (up to 4x) and notable utility gains (3–20%), while remaining compatible with secure aggregation and DP. The method is simple to implement in production FL systems and reduces the need for extensive momentum tuning across tasks, improving scalability and privacy-preserving performance in asynchronous settings.
Abstract
Asynchronous protocols have been shown to improve the scalability of federated learning (FL) with a massive number of clients. Meanwhile, momentum-based methods can achieve the best model quality in synchronous FL. However, naively applying momentum in asynchronous FL algorithms leads to slower convergence and degraded model performance. It is still unclear how to effective combinie these two techniques together to achieve a win-win. In this paper, we find that asynchrony introduces implicit bias to momentum updates. In order to address this problem, we propose momentum approximation that minimizes the bias by finding an optimal weighted average of all historical model updates. Momentum approximation is compatible with secure aggregation as well as differential privacy, and can be easily integrated in production FL systems with a minor communication and storage cost. We empirically demonstrate that on benchmark FL datasets, momentum approximation can achieve $1.15 \textrm{--}4\times$ speed up in convergence compared to naively combining asynchronous FL with momentum.
