Resampling methods for private statistical inference

Karan Chadha; John Duchi; Rohith Kuditipudi

Resampling methods for private statistical inference

Karan Chadha, John Duchi, Rohith Kuditipudi

TL;DR

This work addresses the problem of constructing valid confidence sets under differential privacy for general estimators. It develops two private variants of the non-parametric bootstrap that privately aggregate results from many little bootstraps via median-like mechanisms, enabling percentile and normal-approximation confidence intervals with asymptotic guarantees. Under mild regularity conditions and typical Edgeworth-expansion assumptions, the private methods attain coverage $1 - \alpha$ up to a $\widetilde{O}(n^{-1/2})$ error and achieve substantially shorter interval widths than prior private methods in mean, median, and logistic regression tasks. The methods rely on private median aggregation to improve stability and privacy, and they extend to objective-perturbation-based M-estimation with subsampling, offering a practical framework for privacy-preserving uncertainty quantification across a broad class of estimators.

Abstract

We consider the task of constructing confidence intervals with differential privacy. We propose two private variants of the non-parametric bootstrap, which privately compute the median of the results of multiple "little" bootstraps run on partitions of the data and give asymptotic bounds on the coverage error of the resulting confidence intervals. For a fixed differential privacy parameter $ε$, our methods enjoy the same error rates as that of the non-private bootstrap to within logarithmic factors in the sample size $n$. We empirically validate the performance of our methods for mean estimation, median estimation, and logistic regression with both real and synthetic data. Our methods achieve similar coverage accuracy to existing methods (and non-private baselines) while providing notably shorter ($\gtrsim 10$ times) confidence intervals than previous approaches.

Resampling methods for private statistical inference

TL;DR

up to a

error and achieve substantially shorter interval widths than prior private methods in mean, median, and logistic regression tasks. The methods rely on private median aggregation to improve stability and privacy, and they extend to objective-perturbation-based M-estimation with subsampling, offering a practical framework for privacy-preserving uncertainty quantification across a broad class of estimators.

Abstract

, our methods enjoy the same error rates as that of the non-private bootstrap to within logarithmic factors in the sample size

. We empirically validate the performance of our methods for mean estimation, median estimation, and logistic regression with both real and synthetic data. Our methods achieve similar coverage accuracy to existing methods (and non-private baselines) while providing notably shorter (

times) confidence intervals than previous approaches.

Paper Structure (41 sections, 31 theorems, 148 equations, 16 figures, 5 algorithms)

This paper contains 41 sections, 31 theorems, 148 equations, 16 figures, 5 algorithms.

Introduction
Related Work
Preliminaries and notation
Notation.
Private Confidence Intervals
Private median algorithms
A private percentile bootstrap
Private error estimation and normal approximation
Validity and Asymptotic Rates
Asymptotic Normality and Consistency
Edgeworth expansions and asymptotic rates
On higher-order accuracy and studentization
Empirical risk minimization, objective perturbation, and accuracy
Subsampling consistency for objective perturbation
Consistency via Edgeworth expansions and objective perturbation
...and 26 more sections

Key Result

Proposition 1

Let $y(t)$ and $z(t) \in \mathbb{R}^k$, $t = 1, \ldots, T$, be sequences of vectors satisfying $d_{\textup{ham}}(y(t), z(t)) \le 1$ for each $t$. Let $\varepsilon > 0$ and $\xi_0 \sim \mathsf{Lap}(\frac{k}{2}, \frac{2}{\varepsilon})$ and $\xi_t \stackrel{\rm iid}{\sim} \mathsf{Lap}(0, \frac{4}{\vare

Figures (16)

Figure 1: Coverage and confidence interval widths for all experiments. Each uses $\varepsilon_{\textup{total}} = 8$.
Figure 2: Histogram of widths of confidence intervals constructed for mean estimation for $n = 300$
Figure 3: Hyperparameter sensitivity of \ref{['algorithm:blb-var']} on mean estimation with $\varepsilon_{\textup{total}} = 8$
Figure 4: Hyperparameter sensitivity of \ref{['algorithm:blb-var']} on mean estimation with $\varepsilon_{\textup{total}} = 8$
Figure 5: Coverage rates for \ref{['algorithm:blb-quant']} on mean estimation with $\varepsilon_{\textup{total}} = 8$. Each plot varies $c$ in the interval sets $I_t = [-c t /\sqrt{n}, ct / \sqrt{n}]$, fixing multiplier $K$.
...and 11 more figures

Theorems & Definitions (33)

Proposition 1
Lemma 3.1
Theorem 1
Corollary 3.1
Theorem 2
Definition 4.1
Proposition 2
Corollary 4.1
Proposition 3
Corollary 4.2
...and 23 more

Resampling methods for private statistical inference

TL;DR

Abstract

Resampling methods for private statistical inference

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (16)

Theorems & Definitions (33)