Table of Contents
Fetching ...

Gradient Descent with Linearly Correlated Noise: Theory and Applications to Differential Privacy

Anastasia Koloskova, Ryan McKenna, Zachary Charles, Keith Rush, Brendan McMahan

TL;DR

This paper analyzes gradient descent when the injected noise across iterations is linearly correlated, a scenario motivated by DP-FTRL and MF-DP-FTRL approaches in differential privacy. It develops a restart-based analytic framework that yields tighter convergence rates for both PGD and Anti-PGD under $L$-smoothness, and shows a nuanced dependence of the convergence on the row differences of the factorization component B within a window of size $ au$. Building on these insights, the authors propose a modified offline factorization objective, introducing DP-MF+ which minimizes a Lambda_tau-weighted noise proxy to better capture optimization performance; they demonstrate theoretical improvements and validate them with synthetic and real-data experiments, including MNIST, CIFAR-10, and Stack Overflow tasks. The results illuminate how linearly correlated noise can be harnessed to improve privacy-utility trade-offs and guide the design of matrix-factorization based DP mechanisms, while also identifying open questions such as clipping, momentum, and last-iterate convergence for broader noise structures.

Abstract

We study gradient descent under linearly correlated noise. Our work is motivated by recent practical methods for optimization with differential privacy (DP), such as DP-FTRL, which achieve strong performance in settings where privacy amplification techniques are infeasible (such as in federated learning). These methods inject privacy noise through a matrix factorization mechanism, making the noise linearly correlated over iterations. We propose a simplified setting that distills key facets of these methods and isolates the impact of linearly correlated noise. We analyze the behavior of gradient descent in this setting, for both convex and non-convex functions. Our analysis is demonstrably tighter than prior work and recovers multiple important special cases exactly (including anticorrelated perturbed gradient descent). We use our results to develop new, effective matrix factorizations for differentially private optimization, and highlight the benefits of these factorizations theoretically and empirically.

Gradient Descent with Linearly Correlated Noise: Theory and Applications to Differential Privacy

TL;DR

This paper analyzes gradient descent when the injected noise across iterations is linearly correlated, a scenario motivated by DP-FTRL and MF-DP-FTRL approaches in differential privacy. It develops a restart-based analytic framework that yields tighter convergence rates for both PGD and Anti-PGD under -smoothness, and shows a nuanced dependence of the convergence on the row differences of the factorization component B within a window of size . Building on these insights, the authors propose a modified offline factorization objective, introducing DP-MF+ which minimizes a Lambda_tau-weighted noise proxy to better capture optimization performance; they demonstrate theoretical improvements and validate them with synthetic and real-data experiments, including MNIST, CIFAR-10, and Stack Overflow tasks. The results illuminate how linearly correlated noise can be harnessed to improve privacy-utility trade-offs and guide the design of matrix-factorization based DP mechanisms, while also identifying open questions such as clipping, momentum, and last-iterate convergence for broader noise structures.

Abstract

We study gradient descent under linearly correlated noise. Our work is motivated by recent practical methods for optimization with differential privacy (DP), such as DP-FTRL, which achieve strong performance in settings where privacy amplification techniques are infeasible (such as in federated learning). These methods inject privacy noise through a matrix factorization mechanism, making the noise linearly correlated over iterations. We propose a simplified setting that distills key facets of these methods and isolates the impact of linearly correlated noise. We analyze the behavior of gradient descent in this setting, for both convex and non-convex functions. Our analysis is demonstrably tighter than prior work and recovers multiple important special cases exactly (including anticorrelated perturbed gradient descent). We use our results to develop new, effective matrix factorizations for differentially private optimization, and highlight the benefits of these factorizations theoretically and empirically.
Paper Structure (51 sections, 7 theorems, 82 equations, 6 figures, 2 tables)

This paper contains 51 sections, 7 theorems, 82 equations, 6 figures, 2 tables.

Key Result

Proposition 4.4

Under Assumptions as:noise, as:smooth and as:convex, if $\mathbf{B} = \mathbf{S}$ and $\gamma < 1/2L$, then the output of eq:opt-setup-matrix satisfies

Figures (6)

  • Figure 1: Two-stage MF-DP-FTRL workflow proposed by denisov2022:matrix-fact. The user selects a workload matrix $\mathbf{A}$ representing a desired first-order optimization method. Offline, the user finds a factorization $\mathbf{B}\mathbf{C} = \mathbf{A}$, using an objective that balances ERM performance (as a function of $\mathbf{B}$) and privacy (as a function of $\mathbf{C}$). The user applies $\mathbf{A}$ to a downstream ERM task, but with linearly correlated additive noise governed by $\mathbf{B}$.
  • Figure 2: Comparison of the average and last gradient norms for DP-MF and DP-MF$^+$ on a random non-strongly convex quadratic function with $L = 10$.
  • Figure 3: Test set accuracy of various mechanisms on the MNIST and CIFAR-10 datasets.
  • Figure 4: Comparison of PGD and Chess-PGD under the fixed stepsize, $\gamma = 0.02$. Y axis in the log scale on the left, and in the normal scale on the right.
  • Figure 5: Elements of $\Lambda_{\tau}$ for $T = 12$, and $\tau = 3$.
  • ...and 1 more figures

Theorems & Definitions (12)

  • Example 2.1: SGD
  • Example 3.1: PGD
  • Example 3.2: Anti-PGD
  • Example 3.3: Tree Aggregation DP-FTRL
  • Proposition 4.4: Adapted from Dekel12:sgd_convex_proof
  • Proposition 4.5
  • Theorem 4.6: non-convex
  • Theorem 4.7: convex
  • Example A.1: Chess-PGD
  • Lemma C.1
  • ...and 2 more