Gradient Descent with Linearly Correlated Noise: Theory and Applications to Differential Privacy
Anastasia Koloskova, Ryan McKenna, Zachary Charles, Keith Rush, Brendan McMahan
TL;DR
This paper analyzes gradient descent when the injected noise across iterations is linearly correlated, a scenario motivated by DP-FTRL and MF-DP-FTRL approaches in differential privacy. It develops a restart-based analytic framework that yields tighter convergence rates for both PGD and Anti-PGD under $L$-smoothness, and shows a nuanced dependence of the convergence on the row differences of the factorization component B within a window of size $ au$. Building on these insights, the authors propose a modified offline factorization objective, introducing DP-MF+ which minimizes a Lambda_tau-weighted noise proxy to better capture optimization performance; they demonstrate theoretical improvements and validate them with synthetic and real-data experiments, including MNIST, CIFAR-10, and Stack Overflow tasks. The results illuminate how linearly correlated noise can be harnessed to improve privacy-utility trade-offs and guide the design of matrix-factorization based DP mechanisms, while also identifying open questions such as clipping, momentum, and last-iterate convergence for broader noise structures.
Abstract
We study gradient descent under linearly correlated noise. Our work is motivated by recent practical methods for optimization with differential privacy (DP), such as DP-FTRL, which achieve strong performance in settings where privacy amplification techniques are infeasible (such as in federated learning). These methods inject privacy noise through a matrix factorization mechanism, making the noise linearly correlated over iterations. We propose a simplified setting that distills key facets of these methods and isolates the impact of linearly correlated noise. We analyze the behavior of gradient descent in this setting, for both convex and non-convex functions. Our analysis is demonstrably tighter than prior work and recovers multiple important special cases exactly (including anticorrelated perturbed gradient descent). We use our results to develop new, effective matrix factorizations for differentially private optimization, and highlight the benefits of these factorizations theoretically and empirically.
