Table of Contents
Fetching ...

Correlating Cross-Iteration Noise for DP-SGD using Model Curvature

Xin Gu, Yingtai Xiao, Guanlin He, Jiamu Bai, Daniel Kifer, Kiwan Maeng

TL;DR

DP-SGD with differential privacy suffers from an accuracy gap. The authors propose NoiseCurve, a curvature-aware mechanism that uses Hessian eigenvalue information derived from public unlabeled data to better correlate noise across iterations (DP-MF). They derive a curvature-informed objective and present S1–S4 strategies to estimate and adapt curvature for real networks, including large models with eigenvalue tail fitting. Experiments on CIFAR-10 and ChestX-ray14 across convex and nonconvex regimes show consistent accuracy gains (1–4 percentage points) over DP-SGD and DP-BandMF. The work demonstrates the practical viability of curvature-informed correlated noise for privacy-preserving learning, while acknowledging limitations related to public data dependence and eigenvalue estimation.

Abstract

Differentially private stochastic gradient descent (DP-SGD) offers the promise of training deep learning models while mitigating many privacy risks. However, there is currently a large accuracy gap between DP-SGD and normal SGD training. This has resulted in different lines of research investigating orthogonal ways of improving privacy-preserving training. One such line of work, known as DP-MF, correlates the privacy noise across different iterations of stochastic gradient descent -- allowing later iterations to cancel out some of the noise added to earlier iterations. In this paper, we study how to improve this noise correlation. We propose a technique called NoiseCurve that uses model curvature, estimated from public unlabeled data, to improve the quality of this cross-iteration noise correlation. Our experiments on various datasets, models, and privacy parameters show that the noise correlations computed by NoiseCurve offer consistent and significant improvements in accuracy over the correlation scheme used by DP-MF.

Correlating Cross-Iteration Noise for DP-SGD using Model Curvature

TL;DR

DP-SGD with differential privacy suffers from an accuracy gap. The authors propose NoiseCurve, a curvature-aware mechanism that uses Hessian eigenvalue information derived from public unlabeled data to better correlate noise across iterations (DP-MF). They derive a curvature-informed objective and present S1–S4 strategies to estimate and adapt curvature for real networks, including large models with eigenvalue tail fitting. Experiments on CIFAR-10 and ChestX-ray14 across convex and nonconvex regimes show consistent accuracy gains (1–4 percentage points) over DP-SGD and DP-BandMF. The work demonstrates the practical viability of curvature-informed correlated noise for privacy-preserving learning, while acknowledging limitations related to public data dependence and eigenvalue estimation.

Abstract

Differentially private stochastic gradient descent (DP-SGD) offers the promise of training deep learning models while mitigating many privacy risks. However, there is currently a large accuracy gap between DP-SGD and normal SGD training. This has resulted in different lines of research investigating orthogonal ways of improving privacy-preserving training. One such line of work, known as DP-MF, correlates the privacy noise across different iterations of stochastic gradient descent -- allowing later iterations to cancel out some of the noise added to earlier iterations. In this paper, we study how to improve this noise correlation. We propose a technique called NoiseCurve that uses model curvature, estimated from public unlabeled data, to improve the quality of this cross-iteration noise correlation. Our experiments on various datasets, models, and privacy parameters show that the noise correlations computed by NoiseCurve offer consistent and significant improvements in accuracy over the correlation scheme used by DP-MF.

Paper Structure

This paper contains 30 sections, 7 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Test accuracy (averaged over 3 runs) on CIFAR-10 for $\delta=10^{-5}$ as $\epsilon$ varies. Details in \ref{['sec:experiments']}. Curvature is obtained in an unsupervised way from TinyImageNet.
  • Figure 2: Largest 10,000 eigenvalues of four datasets estimated on two CNNs. Setup details are in \ref{['app:expsetup']}.
  • Figure 3: Change of the eigenvalues during training, (a) without and (b) with pretraining. Only values between -0.05 and 0.05 are zoomed in. Experiment details are in \ref{['app:expsetup']}.
  • Figure 4: The true (positive) eigenvalues (in descending order) of a small CNN ($\sim$30,000 parameters) on CIFAR-10 validation set and our approximated curve, using the top $k = 20, 200, 1000$ eigenvalues for fitting. We used $p_+$=12,000 and $\mu_{p_+} = 10^{-6}$. Details of the setup can be found in \ref{['app:expsetup']}.
  • Figure 5: Change of the Hessian eigenvalues during training, a full display of \ref{['fig:eigen_no_change_a']}. Despite significant negative eigenvalues at initialization, positive eigenvalues dominate.

Theorems & Definitions (1)

  • definition 1: approxdp