Bayesian Inference Under Differential Privacy With Bounded Data

Zeki Kazan; Jerome P. Reiter

Bayesian Inference Under Differential Privacy With Bounded Data

Zeki Kazan, Jerome P. Reiter

TL;DR

Bayesian inference for the parameters of Gaussian models of bounded data protected by differential privacy is described and it is demonstrated that analysts can and should take constraints imposed by the bounds into account when specifying prior distributions.

Abstract

We describe Bayesian inference for the parameters of Gaussian models of bounded data protected by differential privacy. Using this setting, we demonstrate that analysts can and should take constraints imposed by the bounds into account when specifying prior distributions. Additionally, we provide theoretical and empirical results regarding what classes of default priors produce valid inference for a differentially private release in settings where substantial prior information is not available. We discuss how these results can be applied to Bayesian inference for regression with differentially private data.

Bayesian Inference Under Differential Privacy With Bounded Data

TL;DR

Abstract

Paper Structure (32 sections, 9 theorems, 60 equations, 12 figures, 2 algorithms)

This paper contains 32 sections, 9 theorems, 60 equations, 12 figures, 2 algorithms.

INTRODUCTION
BACKGROUND AND SETTING
Differential Privacy
The Univariate Gaussian Setting
ENFORCING CONSTRAINTS
Constraints for the Gaussian Setting
Example: The Blood Lead Dataset
DEFAULT PRIOR CHOICES
Default Priors for the Gaussian Setting
Default Priors for Example \ref{['ex:lead']}
REGRESSION APPLICATION
DISCUSSION
RE-SCALING THE DATA
THE TRUNCATED GAMMA MIXTURE DISTRIBUTION
DERIVATION OF FULL CONDITIONALS FOR UNIVARIATE GAUSSIAN GIBBS SAMPLER
...and 17 more sections

Key Result

Theorem 1

Let $Y_1, \ldots, Y_n \in [a,b]$ and let $\tilde{Y}_i = (Y_i - a)/(b - a) \in [0,1]$. Let $\bar{Y}$ and $S^2$ be the sample mean and variance for $\{Y_i\}$ and let $\tilde{\bar{Y}}$ and $\tilde{S}^2$ be the sample mean and variance for $\{ \tilde{Y}_i\}$. Suppose each statistic is released via the L

Figures (12)

Figure 1: Joint posterior distribution for $(\mu,\sigma^2)$ in Example \ref{['ex:lead']}. Here, $(\bar{Y}^* = 34.30, S^{2*} = 47.17^2)$ is represented by the red circle, the unreleased $(\bar{Y} = 32.08, S^2 = 16.96^2)$ is represented by the green triangle, the analyst's posterior mode is represented by the blue diamond, and $(\mu_0 = 12.5, S^2 = 3.8^2)$ is represented by the purple square. Upper and lower panels display the posterior when constraints are and are not accounted for, respectively. The shaded area represents the feasible region for $(\mu, \sigma^2)$ from Theorem \ref{['thm:par_bounds']}. Plots based on 5,000 Gibbs sampler iterations.
Figure 2: The average length (top) and coverage rate (bottom) of 95% HPD intervals for $\mu$ for different $n$. Results based on 10,000 simulated datasets $Y_i \in [0,1]$ released with $\varepsilon_1 = \varepsilon_2 = 0.1$ and analyzed with prior $p(\mu, \sigma^2) \propto 1$. Analyses with constraints accounted for are solid lines and not accounted for are dashed lines. Data generating model is either ${\mathcal{N}}(\mu = 0.1, \sigma^2 = 0.04^2)$ (red) or ${\mathcal{N}}(\mu = 0.5, \sigma^2 = 0.2^2)$ (blue). Note that the red dashed line is below the blue dashed line. Each Gibbs sampler is run for 20,000 iterations.
Figure 3: Plot of the joint posterior distribution for $(\mu,\sigma^2)$ in Example \ref{['ex:lead']} under a uniform prior. The point $(\bar{Y}^* = 34.30, S^2 = 47.17^2)$ is represented by the red circle, the unreleased point $(\bar{Y} = 32.08, S^2 = 16.96^2)$ is represented by the green triangle, and the analyst's posterior mode is represented by the blue diamond. The upper and lower panels provide the posterior when constraints are and are not accounted for, respectively. The shaded area represents the feasible region for $(\mu, \sigma^2)$ from Theorem \ref{['thm:par_bounds']}. This plot is based on 5,000 Gibbs iterations.
Figure 4: Posterior predictive distribution of a new observation for the posterior draws from Example \ref{['ex:lead']} and prior $p(\mu, \sigma^2) \propto 1$. Upper and lower panels provide the posterior predictive distributions without and without accounting for constraints, respectively. The shaded area represents the feasible region for a new observation. Plots based on 100,000 Gibbs sampler iterations.
Figure 5: Plot of posterior draws from the linear regression method of bernstein2019differentially. The left panels represent draws of the imputed sufficient statistics ${\mathbf Y}^\top {\mathbf 1}$ and ${\mathbf Y}^\top {\mathbf Y}$, while the right panels represent draws of the parameters $\theta_1$ and $\theta_0$. The upper and lower panels provide the posterior when constraints are and are not accounted for, respectively. The shaded areas represent the feasible regions; points outside the feasible regions are colored in red. This plot is based on 10,000 Gibbs iterations.
...and 7 more figures

Theorems & Definitions (20)

Theorem 1
Definition 1
Theorem 2
Corollary 1
Theorem 3
Example 1
Theorem 4
Theorem 5
proof
Theorem 6
...and 10 more

Bayesian Inference Under Differential Privacy With Bounded Data

TL;DR

Abstract

Bayesian Inference Under Differential Privacy With Bounded Data

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (20)