Prior-itizing Privacy: A Bayesian Approach to Setting the Privacy Budget in Differential Privacy

Zeki Kazan; Jerome P. Reiter

Prior-itizing Privacy: A Bayesian Approach to Setting the Privacy Budget in Differential Privacy

Zeki Kazan, Jerome P. Reiter

TL;DR

This work reframes the challenge of setting the DP privacy budget $\varepsilon$ as a risk-management problem rooted in Bayesian disclosure analysis. By introducing risk profiles that bound the posterior-to-prior disclosure risk, the authors derive an explicit bound relating $\varepsilon$ to these risks and provide a minimization formulation to select the smallest $\varepsilon$ that satisfies the agency's constraints for all plausible priors. The framework applies to any DP mechanism, yields closed-form solutions for several profiles, and can accommodate more complex profiles via optimization, enabling tailored privacy-utility trade-offs without consuming additional privacy budget. Practically, this approach facilitates transparent, data-free calibration of privacy parameters and supports decision-makers in balancing confidentiality with data utility in real-world releases.

Abstract

When releasing outputs from confidential data, agencies need to balance the analytical usefulness of the released data with the obligation to protect data subjects' confidentiality. For releases satisfying differential privacy, this balance is reflected by the privacy budget, $\varepsilon$. We provide a framework for setting $\varepsilon$ based on its relationship with Bayesian posterior probabilities of disclosure. The agency responsible for the data release decides how much posterior risk it is willing to accept at various levels of prior risk, which implies a unique $\varepsilon$. Agencies can evaluate different risk profiles to determine one that leads to an acceptable trade-off in risk and utility.

Prior-itizing Privacy: A Bayesian Approach to Setting the Privacy Budget in Differential Privacy

TL;DR

This work reframes the challenge of setting the DP privacy budget

as a risk-management problem rooted in Bayesian disclosure analysis. By introducing risk profiles that bound the posterior-to-prior disclosure risk, the authors derive an explicit bound relating

to these risks and provide a minimization formulation to select the smallest

that satisfies the agency's constraints for all plausible priors. The framework applies to any DP mechanism, yields closed-form solutions for several profiles, and can accommodate more complex profiles via optimization, enabling tailored privacy-utility trade-offs without consuming additional privacy budget. Practically, this approach facilitates transparent, data-free calibration of privacy parameters and supports decision-makers in balancing confidentiality with data utility in real-world releases.

Abstract

. We provide a framework for setting

based on its relationship with Bayesian posterior probabilities of disclosure. The agency responsible for the data release decides how much posterior risk it is willing to accept at various levels of prior risk, which implies a unique

. Agencies can evaluate different risk profiles to determine one that leads to an acceptable trade-off in risk and utility.

Paper Structure (24 sections, 10 theorems, 121 equations, 5 figures, 5 tables)

This paper contains 24 sections, 10 theorems, 121 equations, 5 figures, 5 tables.

Introduction
Background and Motivation
Differential Privacy
Bayesian Measures of Disclosure Risk
Risk Profiles
Theoretical Results
Using Posterior-to-prior Risks for Setting Epsilon
Managing the Trade-off in Privacy and Utility
Relationship to Prior Work
Commentary
Supplemental Tables
Notation Summary
Closed Forms for Epsilon
Additional Results
Omitted Results
...and 9 more sections

Key Result

Lemma 1

Under Assumption as:2 and Assumption as:3, if the release of $T^* = t^*$ satisfies $\varepsilon$-DP, then for any subset ${\mathcal{S}}$ of the domain of $Y_i$, we have

Figures (5)

Figure 1: Each column corresponds to a particular hypothetical agency. The first row presents the agency's risk profile and the second row presents the profile's implied maximal allowable $\varepsilon$ at each point on the curve. Agency 1's risk profile is given by (\ref{['eq:agency4']}) with ${\tilde{a}} = 0.1$, ${\tilde{q}} = 1$, and ${\tilde{r}} = 3$, while Agency 2's risk profile is given by (\ref{['eq:agency3']}) with ${\tilde{a}} = 0.1$, ${\tilde{p}} = 0.05$, and ${\tilde{r}} = 3$.
Figure 2: The top panel presents the probability that the differentially private algorithm switches whether Durham county's released rate is above or below the 6.0 target---e.g., the added noise makes the released rate 6.5 but the actual rate is 5.5---for four hypothetical absolute differences in the true and target rate. The middle panel presents RMSEs of the noisy count of infant deaths. The bottom panel presents the implied $\varepsilon$. Each bar corresponds to a different risk profile of the form in (\ref{['eq:agency_real']}).
Figure 3: The risk profiles for three agencies with risk profile given by (\ref{['eq:ex_1']}). The lines in the lower panels represent the risk profiles for $q_i = 1$ as a function of $p_i$, and the colors represent the implied $\varepsilon$ at each point on the curve. The lines in the upper panels represent the corresponding baseline $r^*(p_i, 1) = {\tilde{r}}$. The left plots set ${\tilde{r}} = 1.5$, the center plots set ${\tilde{r}} = 3$, and the right plots set ${\tilde{r}} = 6$.
Figure 4: The risk profiles for three agencies with risk profile given by (\ref{['eq:ex_2']}). The lines represent the risk profiles for $p_i = 0.05$ as a function of $q_i$, and the colors represent the implied $\varepsilon$ at each point on the curve. The left plot sets ${\tilde{a}} = 0.025$, the center sets ${\tilde{a}} = 0.15$, and the right sets ${\tilde{a}} = 0.3$.
Figure 5: The top panel presents the $r^*(p_i, q_i)$ from (\ref{['eq:2D_profile']}) as a function of $p_i$ and $q_i$. The bottom panel presents the implied $\varepsilon_i(p_i, q_i)$ as a function of $p_i$ and $q_i$. The red point represents $\mathop{\mathrm{argmin}}\limits_{(p_i, q_i)} \varepsilon_i(p_i, q_i)$. For clarity of presentation, all $r^*(p_i, q_i) > 100$ are truncated to $100$ and all $\varepsilon_i(p_i, q_i) > 4$ are truncated to $4$.

Theorems & Definitions (26)

Definition 1: Geometric Mechanism
Definition 2: Relative Disclosure Risk
Definition 3: Absolute Disclosure Risk
Lemma 1
Theorem 1
Theorem 2
Example 1
Example 2
Example 3
Corollary 1
...and 16 more

Prior-itizing Privacy: A Bayesian Approach to Setting the Privacy Budget in Differential Privacy

TL;DR

Abstract

Prior-itizing Privacy: A Bayesian Approach to Setting the Privacy Budget in Differential Privacy

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (26)