Table of Contents
Fetching ...

Correlated Noise Provably Beats Independent Noise for Differentially Private Learning

Christopher A. Choquette-Choo, Krishnamurthy Dvijotham, Krishna Pillutla, Arun Ganesh, Thomas Steinke, Abhradeep Thakurta

TL;DR

The paper addresses privacy in learning by showing that correlated noise, implemented via DP-FTRL with Toeplitz noise, provably improves utility over standard DP-SGD. By transforming the training dynamics into linear time-invariant systems and applying frequency-domain analysis, it derives analytically optimal noise correlations, introduces the ν-DP-FTRL family, and provides sharp asymptotic and finite-time bounds for quadratic and strongly convex objectives. The key findings include an exponential separation between Noisy-SGD and ν-Noisy-FTRL that scales with the problem's effective dimension and condition number, and practical, low-cost methods that match or exceed prior state-of-the-art with significantly reduced computation. Empirical results on private image and language tasks validate the theory, showing notable gains over prior efficient mechanisms while maintaining scalability and memory efficiency.

Abstract

Differentially private learning algorithms inject noise into the learning process. While the most common private learning algorithm, DP-SGD, adds independent Gaussian noise in each iteration, recent work on matrix factorization mechanisms has shown empirically that introducing correlations in the noise can greatly improve their utility. We characterize the asymptotic learning utility for any choice of the correlation function, giving precise analytical bounds for linear regression and as the solution to a convex program for general convex functions. We show, using these bounds, how correlated noise provably improves upon vanilla DP-SGD as a function of problem parameters such as the effective dimension and condition number. Moreover, our analytical expression for the near-optimal correlation function circumvents the cubic complexity of the semi-definite program used to optimize the noise correlation matrix in previous work. We validate our theory with experiments on private deep learning. Our work matches or outperforms prior work while being efficient both in terms of compute and memory.

Correlated Noise Provably Beats Independent Noise for Differentially Private Learning

TL;DR

The paper addresses privacy in learning by showing that correlated noise, implemented via DP-FTRL with Toeplitz noise, provably improves utility over standard DP-SGD. By transforming the training dynamics into linear time-invariant systems and applying frequency-domain analysis, it derives analytically optimal noise correlations, introduces the ν-DP-FTRL family, and provides sharp asymptotic and finite-time bounds for quadratic and strongly convex objectives. The key findings include an exponential separation between Noisy-SGD and ν-Noisy-FTRL that scales with the problem's effective dimension and condition number, and practical, low-cost methods that match or exceed prior state-of-the-art with significantly reduced computation. Empirical results on private image and language tasks validate the theory, showing notable gains over prior efficient mechanisms while maintaining scalability and memory efficiency.

Abstract

Differentially private learning algorithms inject noise into the learning process. While the most common private learning algorithm, DP-SGD, adds independent Gaussian noise in each iteration, recent work on matrix factorization mechanisms has shown empirically that introducing correlations in the noise can greatly improve their utility. We characterize the asymptotic learning utility for any choice of the correlation function, giving precise analytical bounds for linear regression and as the solution to a convex program for general convex functions. We show, using these bounds, how correlated noise provably improves upon vanilla DP-SGD as a function of problem parameters such as the effective dimension and condition number. Moreover, our analytical expression for the near-optimal correlation function circumvents the cubic complexity of the semi-definite program used to optimize the noise correlation matrix in previous work. We validate our theory with experiments on private deep learning. Our work matches or outperforms prior work while being efficient both in terms of compute and memory.
Paper Structure (55 sections, 46 theorems, 230 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 55 sections, 46 theorems, 230 equations, 4 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1.1

DP-FTRL (alg:dpmf with the clipping enabled) satisfies $\rho$-zero concentrated differential privacy (zCDP) if the noise multiplier is taken as $\sigma_{\sf{dp}}^2 = {\gamma_T^2({\bm{B}})}/{(2\rho)}$ where $\gamma_T\left({{\bm{B}}}\right)= \max_{t < T} {\| ({\bm{B}}^{-1})_{:,t}\|_2}$ is the sensitiv

Figures (4)

  • Figure 1: Left: The ratio of the asymptotic suboptimalities of DP-FTRL to DP-SGD for mean estimation vs. the learning rate $\eta$. DP-FTRL is never worse but is orders of magnitude better at $\eta \to 0$ or $\eta\to 1$. Middle & Right: Time- and frequency-domain descriptions of the optimal noise coefficients for mean estimation (defined in \ref{['thm:mean']}).
  • Figure 2: Linear regression simulations: We plot the empirically observed asymptotic suboptimality of $\nu$-Noisy-FTRL/Noisy-SGD and their theoretical bounds with $d=128$ (varied in the left plot) where the Hessian ${\bm{H}}$ has eigenvalues $\lambda_k = 1/k$ (varied as $k^{-\alpha}$ for $\alpha \in [0.4, 1]$ in the middle plot), and learning rate $\eta = 0.02$ (varied in the right plot). The slope of the corresponding empirical and theoretical lines are nearly equal, showing the tightness of the theory. In particular, we observe that Noisy-SGD has a linear dependence on the dimension (slope $1.00$) and is nearly constant w.r.t. the effective dimension (slope $0.18$) while Noisy-FTRL has a near-linear dependence on the effective dimension (slope $0.94$). Noisy-FTRL (slope $2.03$) also has a better dependence on the learning rate than Noisy-SGD (slope $1.27$).
  • Figure 3: DP-FTRL attains a tighter bound on $F_{\infty}$ with the growing condition number. Here, "Optimized" approximately minimizes \ref{['eq:IQCbound']}. The plots hold for smooth and strongly convex functions ($L=1=G, \sigma_{\sf{sgd}}=0$).
  • Figure 4: The proposed $\nu$-DP-FTRL outperforms all other efficient and anytime mechanisms. It also nearly equals or slightly outperforms the state-of-the-art "ME" mechanism that requires significantly more compute (cf. \ref{['tab:exp_algorithms']}). $^*$The non-private baseline for StackOverflow uses per-user clipping as this improves performance by $\approx0.5\%$ pp.

Theorems & Definitions (95)

  • Theorem 1.1: denisov2022improvedbun2016concentrated
  • Remark 1.2
  • Theorem 2.1
  • proof : Proof Sketch
  • Theorem 2.2
  • Theorem 3.1
  • proof : Proof of \ref{['thm:mean']}
  • proof
  • proof
  • Lemma C.4
  • ...and 85 more