Table of Contents
Fetching ...

On the Privacy of Selection Mechanisms with Gaussian Noise

Jonathan Lebensold, Doina Precup, Borja Balle

TL;DR

This work revisits the analysis of Report Noisy Max and Above Threshold with Gaussian noise and shows that, under the additional assumption that the underlying queries are bounded, it is possible to provide pure ex-ante DP bounds for Report Noisy Max and pure ex-post DP bounds for Above Threshold.

Abstract

Report Noisy Max and Above Threshold are two classical differentially private (DP) selection mechanisms. Their output is obtained by adding noise to a sequence of low-sensitivity queries and reporting the identity of the query whose (noisy) answer satisfies a certain condition. Pure DP guarantees for these mechanisms are easy to obtain when Laplace noise is added to the queries. On the other hand, when instantiated using Gaussian noise, standard analyses only yield approximate DP guarantees despite the fact that the outputs of these mechanisms lie in a discrete space. In this work, we revisit the analysis of Report Noisy Max and Above Threshold with Gaussian noise and show that, under the additional assumption that the underlying queries are bounded, it is possible to provide pure ex-ante DP bounds for Report Noisy Max and pure ex-post DP bounds for Above Threshold. The resulting bounds are tight and depend on closed-form expressions that can be numerically evaluated using standard methods. Empirically we find these lead to tighter privacy accounting in the high privacy, low data regime. Further, we propose a simple privacy filter for composing pure ex-post DP guarantees, and use it to derive a fully adaptive Gaussian Sparse Vector Technique mechanism. Finally, we provide experiments on mobility and energy consumption datasets demonstrating that our Sparse Vector Technique is practically competitive with previous approaches and requires less hyper-parameter tuning.

On the Privacy of Selection Mechanisms with Gaussian Noise

TL;DR

This work revisits the analysis of Report Noisy Max and Above Threshold with Gaussian noise and shows that, under the additional assumption that the underlying queries are bounded, it is possible to provide pure ex-ante DP bounds for Report Noisy Max and pure ex-post DP bounds for Above Threshold.

Abstract

Report Noisy Max and Above Threshold are two classical differentially private (DP) selection mechanisms. Their output is obtained by adding noise to a sequence of low-sensitivity queries and reporting the identity of the query whose (noisy) answer satisfies a certain condition. Pure DP guarantees for these mechanisms are easy to obtain when Laplace noise is added to the queries. On the other hand, when instantiated using Gaussian noise, standard analyses only yield approximate DP guarantees despite the fact that the outputs of these mechanisms lie in a discrete space. In this work, we revisit the analysis of Report Noisy Max and Above Threshold with Gaussian noise and show that, under the additional assumption that the underlying queries are bounded, it is possible to provide pure ex-ante DP bounds for Report Noisy Max and pure ex-post DP bounds for Above Threshold. The resulting bounds are tight and depend on closed-form expressions that can be numerically evaluated using standard methods. Empirically we find these lead to tighter privacy accounting in the high privacy, low data regime. Further, we propose a simple privacy filter for composing pure ex-post DP guarantees, and use it to derive a fully adaptive Gaussian Sparse Vector Technique mechanism. Finally, we provide experiments on mobility and energy consumption datasets demonstrating that our Sparse Vector Technique is practically competitive with previous approaches and requires less hyper-parameter tuning.
Paper Structure (34 sections, 14 theorems, 34 equations, 20 figures, 5 algorithms)

This paper contains 34 sections, 14 theorems, 34 equations, 20 figures, 5 algorithms.

Key Result

Theorem 5

Suppose all queries given to the mechanism $M$ in Algorithm alg:Gaussian_at have sensitivity bounded by $\Delta$. Then for $\gamma > 1$, $\infty > \alpha >1$, where $T$ is a random variable indicating the stopping time of $M(D)$.

Figures (20)

  • Figure 1: Privacy accounting for $\Delta = 1\mathrm{e}{-3}, \delta = 1\mathrm{e}{-5}$. The heatmap shows where the ex-post Above Threshold Analysis offers an improvement over the Gaussian Above Threshold. As a comparison, we simulate expected stopping times for a range of multipliers, for $[a,b] = [0, 1]$. The blue dotted line corresponds to the median stopping time when simulating 10k trials with a worst-case dataset. The red line corresponds to the 80th percentile. The plot shows a range of hyper-parameters as well as where the worst-case dataset is likely to halt. Our bounds provide improvements over the baseline below the blue line when squares are blue.
  • Figure 2: Ex-Post Above Threshold Privacy Loss ($\Delta = 0.001, \sigma_X = 0.15$). The ex-post privacy loss also changes as a function of the public threshold $\rho$. Note that if the mechanism halts after two timesteps, the minimum is observed when $\rho=0.5$. As $t$ increases, the privacy loss decreases as $\rho \to 1$.
  • Figure 3: Gaussian Report Noisy Max for $\Delta=0.01$. Numerical integration (green) compared to Monte Carlo estimate (beige) with 10B samples. Shaded region is the standard deviation over 100 trials. Numerical integration methods are deterministic; error bars only apply to Monte Carlo estimates, which are known to converge to the true estimate with infinite samples.
  • Figure 4: Scatter plot indicating the accuracy for UCI Bikes with $\rho = 0.575$. and final privacy spend over a range of noise multipliers. Final privacy loss ($\epsilon$) is reported for FSRC (green). Threshold noise, $\sigma_X$ in the range of $[0.09, 0.15]$. A clear separation in privacy accounting occurs over a range of noise multipliers.
  • Figure 5: Scatter plot indicating the accuracy and final privacy spend over a range of noise multipliers for $\rho = 0.33$ with the LCL London Energy dataset. Privacy loss ($\epsilon$) is reported for FSRC (green). Threshold noise, $\sigma_X$, was evaluated in the range of $[0.04, 0.16]$. Our accounting method provides benefits when $\sigma_X = 0.04$.
  • ...and 15 more figures

Theorems & Definitions (26)

  • Definition 1: DP Dwork2006Calibrating
  • Definition 1: Privacy Loss Dinur2003
  • Definition 2: pDPKasiviswanathan2008-ug
  • Definition 3: Rényi divergence Renyi1961-st
  • Definition 4: Rényi DP mironov2017renyi
  • Theorem 5: General RDP Bound on Gaussian Above Threshold zhu2020improving
  • Theorem 6: Gaussian Above Threshold RDP zhu2020improving
  • Theorem 7: Pure DP for Gaussian Report Noisy Max
  • Definition 8: Ex-Post DP Ligett2017-accfirst
  • Theorem 9: Pure Ex-post Gaussian Above Threshold
  • ...and 16 more