Table of Contents
Fetching ...

The Discrete Gaussian for Differential Privacy

Clément L. Canonne, Gautam Kamath, Thomas Steinke

TL;DR

This work theoretically and experimentally shows that adding discrete Gaussian noise provides essentially the same privacy and accuracy guarantees as the addition of continuousGaussian noise, and presents an simple and efficient algorithm for exact sampling from this distribution.

Abstract

A key tool for building differentially private systems is adding Gaussian noise to the output of a function evaluated on a sensitive dataset. Unfortunately, using a continuous distribution presents several practical challenges. First and foremost, finite computers cannot exactly represent samples from continuous distributions, and previous work has demonstrated that seemingly innocuous numerical errors can entirely destroy privacy. Moreover, when the underlying data is itself discrete (e.g., population counts), adding continuous noise makes the result less interpretable. With these shortcomings in mind, we introduce and analyze the discrete Gaussian in the context of differential privacy. Specifically, we theoretically and experimentally show that adding discrete Gaussian noise provides essentially the same privacy and accuracy guarantees as the addition of continuous Gaussian noise. We also present an simple and efficient algorithm for exact sampling from this distribution. This demonstrates its applicability for privately answering counting queries, or more generally, low-sensitivity integer-valued queries.

The Discrete Gaussian for Differential Privacy

TL;DR

This work theoretically and experimentally shows that adding discrete Gaussian noise provides essentially the same privacy and accuracy guarantees as the addition of continuousGaussian noise, and presents an simple and efficient algorithm for exact sampling from this distribution.

Abstract

A key tool for building differentially private systems is adding Gaussian noise to the output of a function evaluated on a sensitive dataset. Unfortunately, using a continuous distribution presents several practical challenges. First and foremost, finite computers cannot exactly represent samples from continuous distributions, and previous work has demonstrated that seemingly innocuous numerical errors can entirely destroy privacy. Moreover, when the underlying data is itself discrete (e.g., population counts), adding continuous noise makes the result less interpretable. With these shortcomings in mind, we introduce and analyze the discrete Gaussian in the context of differential privacy. Specifically, we theoretically and experimentally show that adding discrete Gaussian noise provides essentially the same privacy and accuracy guarantees as the addition of continuous Gaussian noise. We also present an simple and efficient algorithm for exact sampling from this distribution. This demonstrates its applicability for privately answering counting queries, or more generally, low-sensitivity integer-valued queries.

Paper Structure

This paper contains 24 sections, 27 theorems, 104 equations, 4 figures, 3 algorithms.

Key Result

Theorem 4

Let $\Delta,\varepsilon>0$. Let $q\colon \mathcal{X}^n \to \mathbb{Z}$ satisfy $|q(x)-q(x')|\le\Delta$ for all $x,x'\in\mathcal{X}^n$ differing on a single entry. Define a randomized algorithm $M\colon \mathcal{X}^n \to \mathbb{Z}$ by $M(x)=q(x)+Y$ where $Y \gets {\mathcal{N}_{\mathbb{Z}}\left(0,\De

Figures (4)

  • Figure 1: Comparison of approximate $(\varepsilon,\delta)$-differential privacy guarantees ($\delta$ as a function of $\varepsilon$).
  • Figure 2: Comparison of tail bounds and variance for continuous, discrete, and rounded Gaussians.
  • Figure 3: Bounds from Fact \ref{['fact:normalization:constant:better']} on the normalization constant $\sum_{n\in\mathbb{Z}} e^{-n^2/(2\sigma^2)}$, as a function of $\sigma$. Note that the normalization constant of the continuous Gaussian, $\sqrt{2\pi\sigma^2}$ (in orange) becomes a very accurate approximation for $\sigma \gg 1$; however, for $\sigma \ll 1$, it is not, as the upper and lower bound from Fact \ref{['fact:normalization:constant:better']} both converge towards $1$, as expected. Interestingly, we see that the lower bound (green) empirically seems to be nearly tight, as it appears to coincide with the exact expression of the normalization constant (dotted blue) for all $\sigma >0$. The discontinuity in the upper bound (orange) happens at $\sigma=\frac{1}{\sqrt{2\pi}}$.
  • Figure 4: Comparison of discrete Gaussian and Laplace noise addition. Left: Utility is fixed (i.e., answer $k=100$ counting queries each with variance $50^2$ )and we consider the curve of approximate $(\varepsilon,\delta)$-differential privacy guarantees that we can achieve. Right: Privacy is fixed (i.e., approximate $(1,10^{-6})$-differential privacy) and we consider the utility (i.e., variance of noise added to each answer) as we vary the number of counting queries to be answered.

Theorems & Definitions (57)

  • Definition 1: Discrete Gaussian
  • Definition 2: Pure/Approximate Differential Privacy
  • Definition 3: Concentrated Differential Privacy
  • Theorem 4: Discrete Gaussian Satisfies Concentrated Differential Privacy
  • Proposition 5
  • Lemma 6
  • proof
  • proof : Proof of Proposition \ref{['prop:renyi']}.
  • Theorem 7: Discrete Gaussian Satisfies Approximate Differential Privacy
  • Definition 8: Privacy Loss Random Variable
  • ...and 47 more