Table of Contents
Fetching ...

Less is More: Revisiting the Gaussian Mechanism for Differential Privacy

Tianxi Ji, Pan Li

TL;DR

The paper addresses the utility loss of high-dimensional differential privacy mechanisms by identifying a curse: Gaussian mechanisms with full-rank covariance incur large expected accuracy loss as dimension grows. It proposes the Rank-1 Singular Multivariate Gaussian (R1SMG) mechanism, which uses noise with a random rank-1 covariance, inspired by an overlooked clue in Dwork and Roth’s Gaussian mechanism analysis. The authors prove a sufficient condition for $(\epsilon,\delta)$-DP, show that the expected accuracy loss scales as $\mathcal{O}(1)$ in the large-dimension limit (with $C_R \to 2/\epsilon$), and demonstrate improved stability and lower noise magnitudes compared to existing mechanisms. Through case studies on 2D histograms, PCA, and DP deep learning, R1SMG achieves substantially better utility under very strict privacy budgets, indicating its strong practical potential for high-dimensional private data releases.

Abstract

Differential privacy via output perturbation has been a de facto standard for releasing query or computation results on sensitive data. However, we identify that all existing Gaussian mechanisms suffer from the curse of full-rank covariance matrices. To lift this curse, we design a Rank-1 Singular Multivariate Gaussian (R1SMG) mechanism. It achieves DP on high dimension query results by perturbing the results with noise following a singular multivariate Gaussian distribution, whose covariance matrix is a randomly generated rank-1 positive semi-definite matrix. In contrast, the classic Gaussian mechanism and its variants all consider deterministic full-rank covariance matrices. Our idea is motivated by a clue from Dwork et al.'s seminal work on the classic Gaussian mechanism that has been ignored in the literature: when projecting multivariate Gaussian noise with a full-rank covariance matrix onto a set of orthonormal basis, only the coefficient of a single basis can contribute to the privacy guarantee. This paper makes the following technical contributions. The R1SMG mechanisms achieves DP guarantee on high dimension query results, while its expected accuracy loss is lower bounded by a term that is on a lower order of magnitude by at least the dimension of query results compared existing Gaussian mechanisms. Compared with other mechanisms, the R1SMG mechanism is more stable and less likely to generate noise with large magnitude that overwhelms the query results, because the kurtosis and skewness of the nondeterministic accuracy loss introduced by this mechanism is larger than that introduced by other mechanisms.

Less is More: Revisiting the Gaussian Mechanism for Differential Privacy

TL;DR

The paper addresses the utility loss of high-dimensional differential privacy mechanisms by identifying a curse: Gaussian mechanisms with full-rank covariance incur large expected accuracy loss as dimension grows. It proposes the Rank-1 Singular Multivariate Gaussian (R1SMG) mechanism, which uses noise with a random rank-1 covariance, inspired by an overlooked clue in Dwork and Roth’s Gaussian mechanism analysis. The authors prove a sufficient condition for -DP, show that the expected accuracy loss scales as in the large-dimension limit (with ), and demonstrate improved stability and lower noise magnitudes compared to existing mechanisms. Through case studies on 2D histograms, PCA, and DP deep learning, R1SMG achieves substantially better utility under very strict privacy budgets, indicating its strong practical potential for high-dimensional private data releases.

Abstract

Differential privacy via output perturbation has been a de facto standard for releasing query or computation results on sensitive data. However, we identify that all existing Gaussian mechanisms suffer from the curse of full-rank covariance matrices. To lift this curse, we design a Rank-1 Singular Multivariate Gaussian (R1SMG) mechanism. It achieves DP on high dimension query results by perturbing the results with noise following a singular multivariate Gaussian distribution, whose covariance matrix is a randomly generated rank-1 positive semi-definite matrix. In contrast, the classic Gaussian mechanism and its variants all consider deterministic full-rank covariance matrices. Our idea is motivated by a clue from Dwork et al.'s seminal work on the classic Gaussian mechanism that has been ignored in the literature: when projecting multivariate Gaussian noise with a full-rank covariance matrix onto a set of orthonormal basis, only the coefficient of a single basis can contribute to the privacy guarantee. This paper makes the following technical contributions. The R1SMG mechanisms achieves DP guarantee on high dimension query results, while its expected accuracy loss is lower bounded by a term that is on a lower order of magnitude by at least the dimension of query results compared existing Gaussian mechanisms. Compared with other mechanisms, the R1SMG mechanism is more stable and less likely to generate noise with large magnitude that overwhelms the query results, because the kurtosis and skewness of the nondeterministic accuracy loss introduced by this mechanism is larger than that introduced by other mechanisms.
Paper Structure (24 sections, 14 theorems, 39 equations, 11 figures, 2 tables)

This paper contains 24 sections, 14 theorems, 39 equations, 11 figures, 2 tables.

Key Result

Proposition 1

The Identified Curse. Let $\bm{x}$ be a dataset, $f(\bm{x})\in \mathbb{R}^M$ the queried results, and $\mathbf{n}\in \mathbb{R}^M$ the perturbation noises introduced by the classic Gaussian mechanism (or its variants, e.g., the analytic Gaussian mechanism balle2018improving, and the Matrix-Variate G

Figures (11)

  • Figure 1: (a) Geometric interpretation of $|\rho_1|+|\rho_2|$. Circumcircles of the described triangle with (b) $\theta<\frac{\pi}{2}$ and (c) $\theta>\frac{\pi}{2}$.
  • Figure 2: Geometric interpretations on the constraints on $\epsilon$ in the classic Gaussian and R1SMG mechanism.
  • Figure 3: Visualization of the null space of the noise generated by the R$1$SMG mechanism. Left: $\mathbb{V}_{1,2}$. Right: $\mathbb{V}_{1,3}$.
  • Figure 4: Visualization of (a) non-private counts, (b)-(h) are differentially private 2D counts obtained by the R1SMG, classic Gaussian, analytic Gaussian, MVG, MGM, DAWA, and $H_b$ mechanisms, respectively. $\epsilon$ is $10^{-5}$ for the R1SMG mechanism and is 0.5 for the other mechanisms.
  • Figure 5: Accuracy loss introduced by different output perturbation mechanisms when $\delta = 10^{-7}$, $\epsilon = 10^{-5}$ for the R1SMG mechanism and $\epsilon = 0.5$ for the other mechanisms.
  • ...and 6 more figures

Theorems & Definitions (29)

  • Definition 1
  • Proposition 1
  • Definition 2
  • Definition 3
  • Theorem 1
  • Theorem 2
  • Definition 4
  • Theorem 3
  • Definition 5
  • Lemma 1
  • ...and 19 more