Double Momentum and Error Feedback for Clipping with Fast Rates and Differential Privacy

Rustem Islamov; Samuel Horvath; Aurelien Lucchi; Peter Richtarik; Eduard Gorbunov

Double Momentum and Error Feedback for Clipping with Fast Rates and Differential Privacy

Rustem Islamov, Samuel Horvath, Aurelien Lucchi, Peter Richtarik, Eduard Gorbunov

TL;DR

This paper addresses the challenge of achieving both strong optimization guarantees and differential privacy in federated learning with data heterogeneity. It introduces Clip21-SGD2M, a method that combines clipping, EF21-style error feedback, and double momentum to stabilize updates and control DP-noise accumulation. Theoretical results establish optimal $O(1/T)$ convergence with full gradients, a near-optimal $\tilde{O}(1/\sqrt{nT})$ rate for stochastic gradients, and a near-optimal local DP-utility trade-off under DP-noise, while empirical experiments on non-convex logistics and neural networks validate its practical advantages over Clip-SGD and Clip21-SGD. The approach thus advances privacy-preserving FL by delivering robust optimization performance under realistic heterogeneity and privacy constraints, with potential for extension to heavy-tailed noise and adaptive optimization variants.

Abstract

Strong Differential Privacy (DP) and Optimization guarantees are two desirable properties for a method in Federated Learning (FL). However, existing algorithms do not achieve both properties at once: they either have optimal DP guarantees but rely on restrictive assumptions such as bounded gradients/bounded data heterogeneity, or they ensure strong optimization performance but lack DP guarantees. To address this gap in the literature, we propose and analyze a new method called Clip21-SGD2M based on a novel combination of clipping, heavy-ball momentum, and Error Feedback. In particular, for non-convex smooth distributed problems with clients having arbitrarily heterogeneous data, we prove that Clip21-SGD2M has optimal convergence rate and also near optimal (local-)DP neighborhood. Our numerical experiments on non-convex logistic regression and training of neural networks highlight the superiority of Clip21-SGD2M over baselines in terms of the optimization performance for a given DP-budget.

Double Momentum and Error Feedback for Clipping with Fast Rates and Differential Privacy

TL;DR

Abstract

Double Momentum and Error Feedback for Clipping with Fast Rates and Differential Privacy

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (53)