Table of Contents
Fetching ...

Differentially Private Learning Beyond the Classical Dimensionality Regime

Cynthia Dwork, Pranay Tankala, Linjun Zhang

TL;DR

The paper addresses learning under differential privacy in the proportional dimensionality regime where $d/n \to \delta$, a setting where privacy-utility tradeoffs are delicate and standard high-dimensional theory is inadequate. It develops a new analytic framework combining the Convex Gaussian Minimax Theorem (CGMT) with universality results (CGMT and GFOM universality) to derive sharp, $1+o(1)$-precise error formulas for objective perturbation, output perturbation, and DP-SGD in robust linear and logistic regression. The results reveal nuanced privacy-utility behavior, including a double-descent-like phenomenon in private objective-perturbation training errors and a nuanced comparison between perturbation schemes dependent on $\delta$ and privacy parameters. The methodological contributions provide a bridge between private learning and modern high-dimensional asymptotics, offering practically relevant insights for DP in large-scale, high-dimensional data. The findings have potential implications for privacy-preserving AI in genomics, neuroscience, and imaging, where feature dimensions grow comparably to, or exceed, sample sizes.

Abstract

We initiate the study of differentially private learning in the proportional dimensionality regime, in which the number of data samples $n$ and problem dimension $d$ approach infinity at rates proportional to one another, meaning that $d/n\toδ$ as $n\to\infty$ for an arbitrary, given constant $δ\in(0,\infty)$. This setting is significantly more challenging than that of all prior theoretical work in high-dimensional differentially private learning, which, despite the name, has assumed that $δ= 0$ or is sufficiently small for problems of sample complexity $O(d)$, a regime typically considered "low-dimensional" or "classical" by modern standards in high-dimensional statistics. We provide sharp theoretical estimates of the error of several well-studied differentially private algorithms for robust linear regression and logistic regression, including output perturbation, objective perturbation, and noisy stochastic gradient descent, in the proportional dimensionality regime. The $1+o(1)$ factor precision of our error estimates enables a far more nuanced understanding of the price of privacy of these algorithms than that afforded by existing, coarser analyses, which are essentially vacuous in the regime we consider. Using our estimates, we discover a previously unobserved "double descent"-like phenomenon in the training error of objective perturbation for robust linear regression. We also identify settings in which output perturbation outperforms objective perturbation on average, and vice versa, demonstrating that the relative performance of these algorithms is less clear-cut than suggested by prior work. To prove our main theorems, we introduce several probabilistic tools that have not previously been used to analyze differentially private learning algorithms, such as a modern Gaussian comparison inequality and recent universality laws with origins in statistical physics.

Differentially Private Learning Beyond the Classical Dimensionality Regime

TL;DR

The paper addresses learning under differential privacy in the proportional dimensionality regime where , a setting where privacy-utility tradeoffs are delicate and standard high-dimensional theory is inadequate. It develops a new analytic framework combining the Convex Gaussian Minimax Theorem (CGMT) with universality results (CGMT and GFOM universality) to derive sharp, -precise error formulas for objective perturbation, output perturbation, and DP-SGD in robust linear and logistic regression. The results reveal nuanced privacy-utility behavior, including a double-descent-like phenomenon in private objective-perturbation training errors and a nuanced comparison between perturbation schemes dependent on and privacy parameters. The methodological contributions provide a bridge between private learning and modern high-dimensional asymptotics, offering practically relevant insights for DP in large-scale, high-dimensional data. The findings have potential implications for privacy-preserving AI in genomics, neuroscience, and imaging, where feature dimensions grow comparably to, or exceed, sample sizes.

Abstract

We initiate the study of differentially private learning in the proportional dimensionality regime, in which the number of data samples and problem dimension approach infinity at rates proportional to one another, meaning that as for an arbitrary, given constant . This setting is significantly more challenging than that of all prior theoretical work in high-dimensional differentially private learning, which, despite the name, has assumed that or is sufficiently small for problems of sample complexity , a regime typically considered "low-dimensional" or "classical" by modern standards in high-dimensional statistics. We provide sharp theoretical estimates of the error of several well-studied differentially private algorithms for robust linear regression and logistic regression, including output perturbation, objective perturbation, and noisy stochastic gradient descent, in the proportional dimensionality regime. The factor precision of our error estimates enables a far more nuanced understanding of the price of privacy of these algorithms than that afforded by existing, coarser analyses, which are essentially vacuous in the regime we consider. Using our estimates, we discover a previously unobserved "double descent"-like phenomenon in the training error of objective perturbation for robust linear regression. We also identify settings in which output perturbation outperforms objective perturbation on average, and vice versa, demonstrating that the relative performance of these algorithms is less clear-cut than suggested by prior work. To prove our main theorems, we introduce several probabilistic tools that have not previously been used to analyze differentially private learning algorithms, such as a modern Gaussian comparison inequality and recent universality laws with origins in statistical physics.

Paper Structure

This paper contains 41 sections, 38 theorems, 221 equations, 7 figures, 3 algorithms.

Key Result

Theorem 1.1

Let $(\sigma^\star, \tau^\star)$ denote the solution to the following system of two scalar equations in two variables $(\sigma, \tau)$, which we write in terms of $L, \lambda, \nu, \delta \in (0, \infty)$, dummy variables $Z_1, Z_2 \overset{\textit{iid}}{\sim} \mathcal{N}(0, 1)$, and $\kappa^2 = \fr The output $\widehat{\bm{\beta}}$ of the objective perturbation algorithm with Huber loss satisfies

Figures (7)

  • Figure 1: \ref{['thm:main-objective-huber-informal']}'s predictions of the estimation error and truncated residuals of objective perturbation with Huber loss. Larger $\delta \in (0, \infty)$ corresponds to smaller $n/(n+d) \in (0, 1)$. Curves indicate theoretical predictions, and dots indicate the mean over $100$ simulations of the algorithm on synthetic data with $n \times d = 1000$. In the left plots, the perturbation magnitude is $\nu = 0$, corresponding to the non-private case, but in the right plots, $\nu = 1/5$. All plots use $L = 10$, $\kappa = 1$, $\bm{\beta}^\star \sim \mathcal{N}(\bm{0}, \kappa^2\bm{I}_d)$, $\bm{\varepsilon}^\star \sim \mathcal{N}(\bm{0}, (1/5)^2\bm{I}_n)$, $\bm{X} \sim \frac{1}{\sqrt{d}} \mathrm{Uniform}(\{-1,+1\}^{n \times d})$, and $\bm{y} = \bm{X}\bm{\beta}^\star + \bm{\varepsilon}^\star$.
  • Figure 2: Left: Comparison of error estimates for output perturbation (Corollary \ref{['thm:main-output-huber-informal']}) and objective perturbation (\ref{['thm:main-objective-huber-informal']}) on Huber regression with $L = 1$, $\bm{\varepsilon}^\star \sim \mathcal{N}(\bm{0}, (1/10)^2\bm{I}_n)$. Right: Comparison of error estimates for output perturbation (Corollary \ref{['thm:main-output-logistic-informal']}) and objective perturbation (\ref{['thm:main-objective-logistic-informal']}) for logistic regression. In both plots, $\kappa = 1$, $\bm{\beta}^\star \sim \mathcal{N}(\bm{0}, \kappa^2\bm{I}_d)$.
  • Figure 3: The functions $H_L$, $H_L'$, and $H_L"$ with $L = 1$.
  • Figure 4: The functions $\rho$, $\rho'$, $\rho"$, and $\mathrm{prox}_{\gamma \rho}$ with $\gamma = 3$.
  • Figure 5: \ref{['thm:logistic-objective-perturbation']}'s predictions of the error of \ref{['alg:objective-perturbation']} with logistic loss. Estimation error refers to $\frac{1}{d}\lVert\bm{\widehat{\beta}} - \bm{\beta}^\star\rVert^2$. Difference of $\rho'$ refers to $\frac{1}{n}\lVert\rho'(\bm{X}\bm{\beta}^\star) - \rho'(\bm{X}\bm{\widehat{\beta}})\rVert^2$, which is related to the residual vector $\bm{y} - \rho'(\bm{X}\bm{\widehat{\beta}})$ since $\bm{y} \sim \mathrm{Bernoulli}(\rho'(\bm{X}\bm{\beta}^\star))$. Curves correspond to theoretical predictions, and dots correspond to the mean over $1000$ simulations of the algorithm on synthetic data with $n \times d = 1000$. In the left plots, the perturbation magnitude is $\nu = 0$, but in the right plots, $\nu = 1/5$. In all plots, the signal strength is $\kappa = 1$, and we consider $\bm{\beta}^\star \sim \mathcal{N}(\bm{0}, \kappa^2\bm{I}_d)$, along with $\bm{X} \sim \frac{1}{\sqrt{d}} \mathrm{Uniform}(\{-1,+1\}^{n \times d})$ and $\bm{y} \sim \mathrm{Bernoulli}(\rho'(\bm{X}\bm{\beta}^\star))$.
  • ...and 2 more figures

Theorems & Definitions (69)

  • Theorem 1.1: Informal Version of \ref{['thm:main-huber-objective-perturbation']}
  • Remark 1.2
  • Theorem 1.3: Informal Version of \ref{['thm:logistic-objective-perturbation']}
  • Theorem 1.4: Informal Version of \ref{['thm:rho-zcdp-bound']}
  • Corollary 1.5: Informal Version of Corollary \ref{['thm:main-huber-output-perturbation']}
  • Corollary 1.6: Informal Version of Corollary \ref{['thm:logistic-output-perturbation']}
  • Definition 2.1: dwork2006dp
  • Definition 2.2: RDP, mironov2017renyi
  • Definition 2.3: zCDP, bun2016zcdp
  • Theorem 2.4: Lemma 2.5 of bun2016zcdp
  • ...and 59 more