Table of Contents
Fetching ...

Bayesian Inference Under Differential Privacy With Bounded Data

Zeki Kazan, Jerome P. Reiter

TL;DR

Bayesian inference for the parameters of Gaussian models of bounded data protected by differential privacy is described and it is demonstrated that analysts can and should take constraints imposed by the bounds into account when specifying prior distributions.

Abstract

We describe Bayesian inference for the parameters of Gaussian models of bounded data protected by differential privacy. Using this setting, we demonstrate that analysts can and should take constraints imposed by the bounds into account when specifying prior distributions. Additionally, we provide theoretical and empirical results regarding what classes of default priors produce valid inference for a differentially private release in settings where substantial prior information is not available. We discuss how these results can be applied to Bayesian inference for regression with differentially private data.

Bayesian Inference Under Differential Privacy With Bounded Data

TL;DR

Bayesian inference for the parameters of Gaussian models of bounded data protected by differential privacy is described and it is demonstrated that analysts can and should take constraints imposed by the bounds into account when specifying prior distributions.

Abstract

We describe Bayesian inference for the parameters of Gaussian models of bounded data protected by differential privacy. Using this setting, we demonstrate that analysts can and should take constraints imposed by the bounds into account when specifying prior distributions. Additionally, we provide theoretical and empirical results regarding what classes of default priors produce valid inference for a differentially private release in settings where substantial prior information is not available. We discuss how these results can be applied to Bayesian inference for regression with differentially private data.
Paper Structure (32 sections, 9 theorems, 60 equations, 12 figures, 2 algorithms)

This paper contains 32 sections, 9 theorems, 60 equations, 12 figures, 2 algorithms.

Key Result

Theorem 1

Let $Y_1, \ldots, Y_n \in [a,b]$ and let $\tilde{Y}_i = (Y_i - a)/(b - a) \in [0,1]$. Let $\bar{Y}$ and $S^2$ be the sample mean and variance for $\{Y_i\}$ and let $\tilde{\bar{Y}}$ and $\tilde{S}^2$ be the sample mean and variance for $\{ \tilde{Y}_i\}$. Suppose each statistic is released via the L

Figures (12)

  • Figure 1: Joint posterior distribution for $(\mu,\sigma^2)$ in Example \ref{['ex:lead']}. Here, $(\bar{Y}^* = 34.30, S^{2*} = 47.17^2)$ is represented by the red circle, the unreleased $(\bar{Y} = 32.08, S^2 = 16.96^2)$ is represented by the green triangle, the analyst's posterior mode is represented by the blue diamond, and $(\mu_0 = 12.5, S^2 = 3.8^2)$ is represented by the purple square. Upper and lower panels display the posterior when constraints are and are not accounted for, respectively. The shaded area represents the feasible region for $(\mu, \sigma^2)$ from Theorem \ref{['thm:par_bounds']}. Plots based on 5,000 Gibbs sampler iterations.
  • Figure 2: The average length (top) and coverage rate (bottom) of 95% HPD intervals for $\mu$ for different $n$. Results based on 10,000 simulated datasets $Y_i \in [0,1]$ released with $\varepsilon_1 = \varepsilon_2 = 0.1$ and analyzed with prior $p(\mu, \sigma^2) \propto 1$. Analyses with constraints accounted for are solid lines and not accounted for are dashed lines. Data generating model is either ${\mathcal{N}}(\mu = 0.1, \sigma^2 = 0.04^2)$ (red) or ${\mathcal{N}}(\mu = 0.5, \sigma^2 = 0.2^2)$ (blue). Note that the red dashed line is below the blue dashed line. Each Gibbs sampler is run for 20,000 iterations.
  • Figure 3: Plot of the joint posterior distribution for $(\mu,\sigma^2)$ in Example \ref{['ex:lead']} under a uniform prior. The point $(\bar{Y}^* = 34.30, S^2 = 47.17^2)$ is represented by the red circle, the unreleased point $(\bar{Y} = 32.08, S^2 = 16.96^2)$ is represented by the green triangle, and the analyst's posterior mode is represented by the blue diamond. The upper and lower panels provide the posterior when constraints are and are not accounted for, respectively. The shaded area represents the feasible region for $(\mu, \sigma^2)$ from Theorem \ref{['thm:par_bounds']}. This plot is based on 5,000 Gibbs iterations.
  • Figure 4: Posterior predictive distribution of a new observation for the posterior draws from Example \ref{['ex:lead']} and prior $p(\mu, \sigma^2) \propto 1$. Upper and lower panels provide the posterior predictive distributions without and without accounting for constraints, respectively. The shaded area represents the feasible region for a new observation. Plots based on 100,000 Gibbs sampler iterations.
  • Figure 5: Plot of posterior draws from the linear regression method of bernstein2019differentially. The left panels represent draws of the imputed sufficient statistics ${\mathbf Y}^\top {\mathbf 1}$ and ${\mathbf Y}^\top {\mathbf Y}$, while the right panels represent draws of the parameters $\theta_1$ and $\theta_0$. The upper and lower panels provide the posterior when constraints are and are not accounted for, respectively. The shaded areas represent the feasible regions; points outside the feasible regions are colored in red. This plot is based on 10,000 Gibbs iterations.
  • ...and 7 more figures

Theorems & Definitions (20)

  • Theorem 1
  • Definition 1
  • Theorem 2
  • Corollary 1
  • Theorem 3
  • Example 1
  • Theorem 4
  • Theorem 5
  • proof
  • Theorem 6
  • ...and 10 more