Table of Contents
Fetching ...

Resampling methods for private statistical inference

Karan Chadha, John Duchi, Rohith Kuditipudi

TL;DR

This work addresses the problem of constructing valid confidence sets under differential privacy for general estimators. It develops two private variants of the non-parametric bootstrap that privately aggregate results from many little bootstraps via median-like mechanisms, enabling percentile and normal-approximation confidence intervals with asymptotic guarantees. Under mild regularity conditions and typical Edgeworth-expansion assumptions, the private methods attain coverage $1 - \alpha$ up to a $\widetilde{O}(n^{-1/2})$ error and achieve substantially shorter interval widths than prior private methods in mean, median, and logistic regression tasks. The methods rely on private median aggregation to improve stability and privacy, and they extend to objective-perturbation-based M-estimation with subsampling, offering a practical framework for privacy-preserving uncertainty quantification across a broad class of estimators.

Abstract

We consider the task of constructing confidence intervals with differential privacy. We propose two private variants of the non-parametric bootstrap, which privately compute the median of the results of multiple "little" bootstraps run on partitions of the data and give asymptotic bounds on the coverage error of the resulting confidence intervals. For a fixed differential privacy parameter $ε$, our methods enjoy the same error rates as that of the non-private bootstrap to within logarithmic factors in the sample size $n$. We empirically validate the performance of our methods for mean estimation, median estimation, and logistic regression with both real and synthetic data. Our methods achieve similar coverage accuracy to existing methods (and non-private baselines) while providing notably shorter ($\gtrsim 10$ times) confidence intervals than previous approaches.

Resampling methods for private statistical inference

TL;DR

This work addresses the problem of constructing valid confidence sets under differential privacy for general estimators. It develops two private variants of the non-parametric bootstrap that privately aggregate results from many little bootstraps via median-like mechanisms, enabling percentile and normal-approximation confidence intervals with asymptotic guarantees. Under mild regularity conditions and typical Edgeworth-expansion assumptions, the private methods attain coverage up to a error and achieve substantially shorter interval widths than prior private methods in mean, median, and logistic regression tasks. The methods rely on private median aggregation to improve stability and privacy, and they extend to objective-perturbation-based M-estimation with subsampling, offering a practical framework for privacy-preserving uncertainty quantification across a broad class of estimators.

Abstract

We consider the task of constructing confidence intervals with differential privacy. We propose two private variants of the non-parametric bootstrap, which privately compute the median of the results of multiple "little" bootstraps run on partitions of the data and give asymptotic bounds on the coverage error of the resulting confidence intervals. For a fixed differential privacy parameter , our methods enjoy the same error rates as that of the non-private bootstrap to within logarithmic factors in the sample size . We empirically validate the performance of our methods for mean estimation, median estimation, and logistic regression with both real and synthetic data. Our methods achieve similar coverage accuracy to existing methods (and non-private baselines) while providing notably shorter ( times) confidence intervals than previous approaches.
Paper Structure (41 sections, 31 theorems, 148 equations, 16 figures, 5 algorithms)

This paper contains 41 sections, 31 theorems, 148 equations, 16 figures, 5 algorithms.

Key Result

Proposition 1

Let $y(t)$ and $z(t) \in \mathbb{R}^k$, $t = 1, \ldots, T$, be sequences of vectors satisfying $d_{\textup{ham}}(y(t), z(t)) \le 1$ for each $t$. Let $\varepsilon > 0$ and $\xi_0 \sim \mathsf{Lap}(\frac{k}{2}, \frac{2}{\varepsilon})$ and $\xi_t \stackrel{\rm iid}{\sim} \mathsf{Lap}(0, \frac{4}{\vare

Figures (16)

  • Figure 1: Coverage and confidence interval widths for all experiments. Each uses $\varepsilon_{\textup{total}} = 8$.
  • Figure 2: Histogram of widths of confidence intervals constructed for mean estimation for $n = 300$
  • Figure 3: Hyperparameter sensitivity of \ref{['algorithm:blb-var']} on mean estimation with $\varepsilon_{\textup{total}} = 8$
  • Figure 4: Hyperparameter sensitivity of \ref{['algorithm:blb-var']} on mean estimation with $\varepsilon_{\textup{total}} = 8$
  • Figure 5: Coverage rates for \ref{['algorithm:blb-quant']} on mean estimation with $\varepsilon_{\textup{total}} = 8$. Each plot varies $c$ in the interval sets $I_t = [-c t /\sqrt{n}, ct / \sqrt{n}]$, fixing multiplier $K$.
  • ...and 11 more figures

Theorems & Definitions (33)

  • Proposition 1
  • Lemma 3.1
  • Theorem 1
  • Corollary 3.1
  • Theorem 2
  • Definition 4.1
  • Proposition 2
  • Corollary 4.1
  • Proposition 3
  • Corollary 4.2
  • ...and 23 more