Table of Contents
Fetching ...

Black-Box Differentially Private Nonparametric Confidence Intervals Under Minimal Assumptions

Tomer Shoham, Moshe Shenfeld, Noa Velner-Harris, Katrina Ligett

TL;DR

This work tackles constructing differentially private nonparametric confidence intervals for arbitrary statistics under minimal distributional assumptions. It introduces PrivSub, a black-box DP framework that repeatedly subsamples data, applies a private estimator, and post-processes the resulting empirical CDF to form a DP $(1-\alpha)$-CI, with privacy amplification driven by subsampling. The authors prove $(\varepsilon,\delta)$-DP guarantees, $\tau_n$-consistency of the private estimators, and asymptotic validity and tightness of the resulting CIs, along with a consistent private CDF of the statistic. Empirical results show PrivSub is competitive with state-of-the-art methods, offering a general, end-to-end DP approach for uncertainty quantification that scales beyond parametric settings and supports multi-level inference through the full CDF estimate.

Abstract

We introduce a simple, general framework that takes any differentially private estimator of any arbitrary quantity as a black box, and from it constructs a differentially private nonparametric confidence interval of that quantity. Our approach repeatedly subsamples the data, applies the private estimator to each subsample, and then post-processes the resulting empirical CDF to a confidence interval. Our analysis uses the randomness from the subsampling to achieve privacy amplification. Under mild assumptions, the empirical CDF we obtain approaches the CDF of the private statistic as the sample size grows. We use this to show that the confidence intervals we estimate are asymptotically valid, tight, and equivalent to their non-private counterparts. We provide empirical evidence that our method performs well compared with the (less-general) state-of-the-art algorithms.

Black-Box Differentially Private Nonparametric Confidence Intervals Under Minimal Assumptions

TL;DR

This work tackles constructing differentially private nonparametric confidence intervals for arbitrary statistics under minimal distributional assumptions. It introduces PrivSub, a black-box DP framework that repeatedly subsamples data, applies a private estimator, and post-processes the resulting empirical CDF to form a DP -CI, with privacy amplification driven by subsampling. The authors prove -DP guarantees, -consistency of the private estimators, and asymptotic validity and tightness of the resulting CIs, along with a consistent private CDF of the statistic. Empirical results show PrivSub is competitive with state-of-the-art methods, offering a general, end-to-end DP approach for uncertainty quantification that scales beyond parametric settings and supports multi-level inference through the full CDF estimate.

Abstract

We introduce a simple, general framework that takes any differentially private estimator of any arbitrary quantity as a black box, and from it constructs a differentially private nonparametric confidence interval of that quantity. Our approach repeatedly subsamples the data, applies the private estimator to each subsample, and then post-processes the resulting empirical CDF to a confidence interval. Our analysis uses the randomness from the subsampling to achieve privacy amplification. Under mild assumptions, the empirical CDF we obtain approaches the CDF of the private statistic as the sample size grows. We use this to show that the confidence intervals we estimate are asymptotically valid, tight, and equivalent to their non-private counterparts. We provide empirical evidence that our method performs well compared with the (less-general) state-of-the-art algorithms.

Paper Structure

This paper contains 29 sections, 15 theorems, 44 equations, 10 figures, 1 algorithm.

Key Result

Theorem 2.2

Under the standard subsampling setting, we have that $\widehat{U}_{n,m}(x) \xrightarrow{p} U(x)$ for any continuity point $x$ of $U(x)$. Furthermore, if $U(\cdot)$ is continuous, then

Figures (10)

  • Figure 1: A comparison of our method (PrivSub) to the other known general, non-parametric DP CI method---the BLB-based method ($\texttt{BLBquant}$chadha2024resampling). We include two baselines: the private baseline tailored to the median ($\texttt{ExpMech}$drechsler2022nonparametric) and the non-private baseline (bootstrapping) and study $0.9$-CI estimation of the median for the (truncated) normal, exponential, and Gaussian mixture distributions under $(5, 0)$-DP. A detailed discussion appears in Section \ref{['sec:num_study']}.
  • Figure 2: Empirical CDF of the median from a single run of PrivSub and its non-private counterpart with $m=n^{2/3}$ and $T=50$ and $\varepsilon=2$, compared to the theoretical distribution, for several sample sizes. The data is drawn from a normal distribution with mean $0$ and standard deviation $2$, truncated to $[-6,4]$.
  • Figure 3: We compare our method (PrivSub) against two baselines: the private baseline tailored to the median ($\texttt{ExpMech}$drechsler2022nonparametric) and the non-private baseline (bootstrapping). We evaluate $1-\alpha=0.9$-CI estimation of the median for the (truncated) normal, exponential, and Gaussian mixture distributions under $(2,0)$-DP. A detailed discussion appears in Section \ref{['sec:num_study']}.\ref{['sec:num_study']}.
  • Figure 4: A comparison of our method (PrivSub) in terms of CI width (top row) and coverage (bottom row) for the median under $\varepsilon_t=5$. We include two baselines: the private baseline tailored to the mean (Laplace noise addition mechanism; see \ref{['Def:Lap_mec']}) and the non-private baseline (bootstrapping). We study $0.9$-CI estimation of the mean for three distributions as described in the figure, where $\mathcal{R}$ denotes the truncation range. A detailed discussion appears in Section \ref{['sec:num_study']}.
  • Figure 5: A comparison of our method (PrivSub) in terms of CI width (top row) and coverage (bottom row) for the median under $\varepsilon_t=2$. We include two baselines: the private baseline tailored to the mean (Laplace noise addition mechanism; see \ref{['Def:Lap_mec']}) and the non-private baseline (bootstrapping). We study $0.9$-CI estimation of the mean for three distributions as described in the figure, where $\mathcal{R}$ denotes the truncation range. A detailed discussion appears in Section \ref{['sec:num_study']}.
  • ...and 5 more figures

Theorems & Definitions (31)

  • Definition 2.1: Asymptotically valid and tight confidence intervals
  • Theorem 2.2: Adapted from Theorem 2.2.1 in politis1999subsampling
  • Theorem 2.3: Corollary 2.4.1 in politis1999subsampling
  • Definition 2.4: Differential privacy
  • Theorem 3.1
  • proof
  • Definition 3.2: $\tau_n$-consistency
  • Theorem 3.3
  • Corollary 3.4
  • Definition A.1: Differential privacy
  • ...and 21 more