Within-Dataset Disclosure Risk for Differential Privacy

Zhiru Zhu; Raul Castro Fernandez

Within-Dataset Disclosure Risk for Differential Privacy

Zhiru Zhu, Raul Castro Fernandez

TL;DR

This work tackles the difficulty of interpreting and selecting the differential privacy parameter $\epsilon$ by introducing the Relative Disclosure Risk Indicator (RDR), a within-dataset, per-individual measure that complements the global DP bound. It defines output-dependent RDR, derives per-individual risk bounds, and presents two algorithms—Find-$\epsilon$-from-RDR and Find-and-release-$\epsilon$-from-RDR—that let controllers express privacy preferences over RDRs and obtain suitable $\epsilon$ values. To handle multiple queries, the paper introduces a privacy-odometer framework and SVT-based private release of $\epsilon$, enabling end-to-end DP guarantees without requiring a fixed total budget. Empirical evaluation includes an IRB-approved user study showing RDR improves consistency in epsilon selection and microbenchmarks demonstrating scalability to datasets with up to a million records. Overall, the approach makes DP more practical for real-world deployments by translating abstract privacy guarantees into actionable, per-individual risk considerations and providing DP-safe mechanisms to manage multiple queries.

Abstract

Differential privacy (DP) enables private data analysis. In a typical DP deployment, controllers manage individuals' sensitive data and are responsible for answering analysts' queries while protecting individuals' privacy. They do so by choosing the privacy parameter $ε$, which controls the degree of privacy for all individuals in all possible datasets. However, it is challenging for controllers to choose $ε$ because of the difficulty of interpreting the privacy implications of such a choice on the within-dataset individuals. To address this challenge, we first derive a relative disclosure risk indicator (RDR) that indicates the impact of choosing $ε$ on the within-dataset individuals' disclosure risk. We then design an algorithm to find $ε$ based on controllers' privacy preferences expressed as a function of the within-dataset individuals' RDRs, and an alternative algorithm that finds and releases $ε$ while satisfying DP. Lastly, we propose a solution that bounds the total privacy leakage when using the algorithm to answer multiple queries without requiring controllers to set the total privacy budget. We evaluate our contributions through an IRB-approved user study that shows the RDR is useful for helping controllers choose $ε$, and experimental evaluations showing our algorithms are efficient and scalable.

Within-Dataset Disclosure Risk for Differential Privacy

TL;DR

This work tackles the difficulty of interpreting and selecting the differential privacy parameter

by introducing the Relative Disclosure Risk Indicator (RDR), a within-dataset, per-individual measure that complements the global DP bound. It defines output-dependent RDR, derives per-individual risk bounds, and presents two algorithms—Find-

-from-RDR and Find-and-release-

-from-RDR—that let controllers express privacy preferences over RDRs and obtain suitable

values. To handle multiple queries, the paper introduces a privacy-odometer framework and SVT-based private release of

, enabling end-to-end DP guarantees without requiring a fixed total budget. Empirical evaluation includes an IRB-approved user study showing RDR improves consistency in epsilon selection and microbenchmarks demonstrating scalability to datasets with up to a million records. Overall, the approach makes DP more practical for real-world deployments by translating abstract privacy guarantees into actionable, per-individual risk considerations and providing DP-safe mechanisms to manage multiple queries.

Abstract

, which controls the degree of privacy for all individuals in all possible datasets. However, it is challenging for controllers to choose

because of the difficulty of interpreting the privacy implications of such a choice on the within-dataset individuals. To address this challenge, we first derive a relative disclosure risk indicator (RDR) that indicates the impact of choosing

on the within-dataset individuals' disclosure risk. We then design an algorithm to find

based on controllers' privacy preferences expressed as a function of the within-dataset individuals' RDRs, and an alternative algorithm that finds and releases

while satisfying DP. Lastly, we propose a solution that bounds the total privacy leakage when using the algorithm to answer multiple queries without requiring controllers to set the total privacy budget. We evaluate our contributions through an IRB-approved user study that shows the RDR is useful for helping controllers choose

, and experimental evaluations showing our algorithms are efficient and scalable.

Paper Structure (29 sections, 4 theorems, 16 equations, 9 figures, 3 algorithms)

This paper contains 29 sections, 4 theorems, 16 equations, 9 figures, 3 algorithms.

Introduction
Background
Differential Privacy
Agents in DP Deployment
Relative Disclosure Risk Indicator
Why Choosing $\epsilon$ is Hard and RDR Overview
Output-dependent RDR
RDR: Formal Definition
Deriving Epsilon based on RDR
Using RDR to Express Privacy Preferences
The Find-$\epsilon$-from-RDR Algorithm
Releasing Epsilon Privately and Bounding Privacy Loss Across Queries
Finding and Releasing $\epsilon$ using SVT
Bounding Privacy Loss Across Queries
End-to-End DP Guarantee
...and 14 more sections

Key Result

Lemma 3.1

$RDR_i$ upper bound under Laplace Mechanism is

Figures (9)

Figure 1: Agents in standard DP deployment
Figure 2: Example of using RDR to find $\epsilon$. The query is to count the number of patients (column P) who have a certain disease (column D) and Laplace Mechanism is used to compute the query. For each $\epsilon$, we show the corresponding RDR of each within-dataset patient.
Figure 3: Dataflow of Find-$\epsilon$-from-RDR Algorithm
Figure 4: $\epsilon$ chosen by each participant
Figure 5: $\epsilon$ chosen by participants who do not know DP (left) and those who know DP (right)
...and 4 more figures

Theorems & Definitions (11)

Definition 1: Output-dependent Relative Disclosure Risk Indicator
Definition 2: Ex-post per-instance privacy loss redberg2021privately
Definition 3: Laplace Mechanism dwork2014algorithmic
Definition 4: Gaussian Mechanism dwork2014algorithmic
Definition 5: Relative Disclosure Risk Indicator
Lemma 3.1
Lemma 3.2
Definition 6: SVT query
Definition 7: Privacy Odometer of Find-$\epsilon$-from-RDR
Lemma A.1
...and 1 more

Within-Dataset Disclosure Risk for Differential Privacy

TL;DR

Abstract

Within-Dataset Disclosure Risk for Differential Privacy

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (11)