Table of Contents
Fetching ...

Differentially Private Multi-Sampling from Distributions

Albert Cheu, Debanuj Nayak

TL;DR

This work extends differential privacy from single-sampling to private multi-sampling, formalizing strong and weak multi-sampling notions and exploring both finite-domain and Gaussian distribution families. It introduces new mechanisms, notably SubRR and ShuRR, to achieve pure and approximate DP with improved sample complexities, and derives a novel Euclidean-Laplace mechanism to handle $\,\ell_2$-sensitive vector sums. The authors show that multi-sampling can substantially beat repeated single-sampling in finite domains and, for Gaussians with known covariance, achieve pure DP sampling with favorable bounds and zCDP guarantees; they also establish lower bounds that separate strong and weak variants. Overall, the results provide a foundation for private synthetic data generation with multiple private samples, including algorithmic techniques, privacy analyses, and complexity trade-offs across DP variants, while highlighting open questions for approximate DP and qualitative notions of weak multi-sampling.

Abstract

Many algorithms have been developed to estimate probability distributions subject to differential privacy (DP): such an algorithm takes as input independent samples from a distribution and estimates the density function in a way that is insensitive to any one sample. A recent line of work, initiated by Raskhodnikova et al. (Neurips '21), explores a weaker objective: a differentially private algorithm that approximates a single sample from the distribution. Raskhodnikova et al. studied the sample complexity of DP \emph{single-sampling} i.e., the minimum number of samples needed to perform this task. They showed that the sample complexity of DP single-sampling is less than the sample complexity of DP learning for certain distribution classes. We define two variants of \emph{multi-sampling}, where the goal is to privately approximate $m>1$ samples. This better models the realistic scenario where synthetic data is needed for exploratory data analysis. A baseline solution to \emph{multi-sampling} is to invoke a single-sampling algorithm $m$ times on independently drawn datasets of samples. When the data comes from a finite domain, we improve over the baseline by a factor of $m$ in the sample complexity. When the data comes from a Gaussian, Ghazi et al. (Neurips '23) show that \emph{single-sampling} can be performed under approximate differential privacy; we show it is possible to \emph{single- and multi-sample Gaussians with known covariance subject to pure DP}. Our solution uses a variant of the Laplace mechanism that is of independent interest. We also give sample complexity lower bounds, one for strong multi-sampling of finite distributions and another for weak multi-sampling of bounded-covariance Gaussians.

Differentially Private Multi-Sampling from Distributions

TL;DR

This work extends differential privacy from single-sampling to private multi-sampling, formalizing strong and weak multi-sampling notions and exploring both finite-domain and Gaussian distribution families. It introduces new mechanisms, notably SubRR and ShuRR, to achieve pure and approximate DP with improved sample complexities, and derives a novel Euclidean-Laplace mechanism to handle -sensitive vector sums. The authors show that multi-sampling can substantially beat repeated single-sampling in finite domains and, for Gaussians with known covariance, achieve pure DP sampling with favorable bounds and zCDP guarantees; they also establish lower bounds that separate strong and weak variants. Overall, the results provide a foundation for private synthetic data generation with multiple private samples, including algorithmic techniques, privacy analyses, and complexity trade-offs across DP variants, while highlighting open questions for approximate DP and qualitative notions of weak multi-sampling.

Abstract

Many algorithms have been developed to estimate probability distributions subject to differential privacy (DP): such an algorithm takes as input independent samples from a distribution and estimates the density function in a way that is insensitive to any one sample. A recent line of work, initiated by Raskhodnikova et al. (Neurips '21), explores a weaker objective: a differentially private algorithm that approximates a single sample from the distribution. Raskhodnikova et al. studied the sample complexity of DP \emph{single-sampling} i.e., the minimum number of samples needed to perform this task. They showed that the sample complexity of DP single-sampling is less than the sample complexity of DP learning for certain distribution classes. We define two variants of \emph{multi-sampling}, where the goal is to privately approximate samples. This better models the realistic scenario where synthetic data is needed for exploratory data analysis. A baseline solution to \emph{multi-sampling} is to invoke a single-sampling algorithm times on independently drawn datasets of samples. When the data comes from a finite domain, we improve over the baseline by a factor of in the sample complexity. When the data comes from a Gaussian, Ghazi et al. (Neurips '23) show that \emph{single-sampling} can be performed under approximate differential privacy; we show it is possible to \emph{single- and multi-sample Gaussians with known covariance subject to pure DP}. Our solution uses a variant of the Laplace mechanism that is of independent interest. We also give sample complexity lower bounds, one for strong multi-sampling of finite distributions and another for weak multi-sampling of bounded-covariance Gaussians.

Paper Structure

This paper contains 36 sections, 31 theorems, 32 equations, 3 tables, 8 algorithms.

Key Result

Lemma 1

If $\mathbf{D}\approx_{\varepsilon,\delta}\mathbf{D}'$, then $|| \mathbf{D}-\mathbf{D}' ||_{TV} \leq \frac{2\delta}{e^\varepsilon+1} + (e^\varepsilon-1)$

Theorems & Definitions (60)

  • Lemma 1
  • proof
  • Definition 2: Pure DP
  • Definition 3: Zero-Concentrated DP
  • Definition 4: Approximate DP
  • Definition 5: Single-Sampling RSSS21
  • Definition 6: Strong Multi-sampling
  • Definition 7: Weak Multi-sampling
  • Lemma 8: From Single to Weak Multi-sampling
  • Lemma 9: From Weak to Strong Multi-sampling
  • ...and 50 more