Double Momentum and Error Feedback for Clipping with Fast Rates and Differential Privacy
Rustem Islamov, Samuel Horvath, Aurelien Lucchi, Peter Richtarik, Eduard Gorbunov
TL;DR
This paper addresses the challenge of achieving both strong optimization guarantees and differential privacy in federated learning with data heterogeneity. It introduces Clip21-SGD2M, a method that combines clipping, EF21-style error feedback, and double momentum to stabilize updates and control DP-noise accumulation. Theoretical results establish optimal $O(1/T)$ convergence with full gradients, a near-optimal $\tilde{O}(1/\sqrt{nT})$ rate for stochastic gradients, and a near-optimal local DP-utility trade-off under DP-noise, while empirical experiments on non-convex logistics and neural networks validate its practical advantages over Clip-SGD and Clip21-SGD. The approach thus advances privacy-preserving FL by delivering robust optimization performance under realistic heterogeneity and privacy constraints, with potential for extension to heavy-tailed noise and adaptive optimization variants.
Abstract
Strong Differential Privacy (DP) and Optimization guarantees are two desirable properties for a method in Federated Learning (FL). However, existing algorithms do not achieve both properties at once: they either have optimal DP guarantees but rely on restrictive assumptions such as bounded gradients/bounded data heterogeneity, or they ensure strong optimization performance but lack DP guarantees. To address this gap in the literature, we propose and analyze a new method called Clip21-SGD2M based on a novel combination of clipping, heavy-ball momentum, and Error Feedback. In particular, for non-convex smooth distributed problems with clients having arbitrarily heterogeneous data, we prove that Clip21-SGD2M has optimal convergence rate and also near optimal (local-)DP neighborhood. Our numerical experiments on non-convex logistic regression and training of neural networks highlight the superiority of Clip21-SGD2M over baselines in terms of the optimization performance for a given DP-budget.
