Table of Contents
Fetching ...

Reveal-or-Obscure: A Differentially Private Sampling Algorithm for Discrete Distributions

Naima Tasnim, Atefeh Gilani, Lalitha Sankar, Oliver Kosut

TL;DR

The paper tackles the problem of privately sampling a single observation from an unknown discrete distribution. It introduces Reveal-or-Obscure (ROO), a DP sampler that obfuscates the empirical distribution by mixing it with uniform sampling, yielding $ε$-DP with a demonstrated sampling complexity bound and improved privacy-utility trade-offs over prior work. It further extends ROO with Data-Specific ROO (DS-ROO), making the obscuring probability data-dependent via $m=n\min_x \hat{P}_{x^n}(x)$ and proving $ε$-DP for the adaptive scheme, accompanied by empirical evidence of utility gains. Collectively, the approach offers a more efficient and flexible DP sampling mechanism for discrete distributions, with implications for privacy-preserving synthetic data and DP-based data analysis in restricted domains.

Abstract

We introduce a differentially private (DP) algorithm called reveal-or-obscure (ROO) to generate a single representative sample from a dataset of $n$ observations drawn i.i.d. from an unknown discrete distribution $P$. Unlike methods that add explicit noise to the estimated empirical distribution, ROO achieves $ε$-differential privacy by randomly choosing whether to "reveal" or "obscure" the empirical distribution. While ROO is structurally identical to Algorithm 1 proposed by Cheu and Nayak (arXiv:2412.10512), we prove a strictly better bound on the sampling complexity than that established in Theorem 12 of (arXiv:2412.10512). To further improve the privacy-utility trade-off, we propose a novel generalized sampling algorithm called Data-Specific ROO (DS-ROO), where the probability of obscuring the empirical distribution of the dataset is chosen adaptively. We prove that DS-ROO satisfies $ε$-DP, and provide empirical evidence that DS-ROO can achieve better utility under the same privacy budget of vanilla ROO.

Reveal-or-Obscure: A Differentially Private Sampling Algorithm for Discrete Distributions

TL;DR

The paper tackles the problem of privately sampling a single observation from an unknown discrete distribution. It introduces Reveal-or-Obscure (ROO), a DP sampler that obfuscates the empirical distribution by mixing it with uniform sampling, yielding -DP with a demonstrated sampling complexity bound and improved privacy-utility trade-offs over prior work. It further extends ROO with Data-Specific ROO (DS-ROO), making the obscuring probability data-dependent via and proving -DP for the adaptive scheme, accompanied by empirical evidence of utility gains. Collectively, the approach offers a more efficient and flexible DP sampling mechanism for discrete distributions, with implications for privacy-preserving synthetic data and DP-based data analysis in restricted domains.

Abstract

We introduce a differentially private (DP) algorithm called reveal-or-obscure (ROO) to generate a single representative sample from a dataset of observations drawn i.i.d. from an unknown discrete distribution . Unlike methods that add explicit noise to the estimated empirical distribution, ROO achieves -differential privacy by randomly choosing whether to "reveal" or "obscure" the empirical distribution. While ROO is structurally identical to Algorithm 1 proposed by Cheu and Nayak (arXiv:2412.10512), we prove a strictly better bound on the sampling complexity than that established in Theorem 12 of (arXiv:2412.10512). To further improve the privacy-utility trade-off, we propose a novel generalized sampling algorithm called Data-Specific ROO (DS-ROO), where the probability of obscuring the empirical distribution of the dataset is chosen adaptively. We prove that DS-ROO satisfies -DP, and provide empirical evidence that DS-ROO can achieve better utility under the same privacy budget of vanilla ROO.

Paper Structure

This paper contains 10 sections, 6 theorems, 59 equations, 3 figures.

Key Result

Theorem 1

Given $q$, Algorithm alg:fixed-q-sampler is $\epsilon$-DP and $\alpha$-accurate for from which we can solve for $q$ to obtain the sampling complexity as

Figures (3)

  • Figure 1: Plot of $q_m$ as a function of $m$ for fixed $k$ and $n$, showing changes under different privacy budgets $\epsilon$.
  • Figure 2: Comparison of true distribution and the estimated output distribution for DS-ROO.
  • Figure 3: Comparison of accuracy ($\alpha$) versus privacy ($\epsilon$) curves of differentially private sampling algorithms.

Theorems & Definitions (8)

  • Definition 1: Accuracy of Sampling axelrod2020sample
  • Definition 2: Differential Privacy dwork2006calibrating
  • Theorem 1
  • Lemma 1
  • Theorem 2
  • Lemma 2
  • Lemma 3
  • Lemma 4