Table of Contents
Fetching ...

Unbiased Statistical Estimation and Valid Confidence Intervals Under Differential Privacy

Christian Covington, Xi He, James Honaker, Gautam Kamath

TL;DR

This work presents a method for producing unbiased parameter estimates and valid confidence intervals under the constraints of differential privacy, a formal framework for limiting individual information leakage from sensitive data and results hold in high dimensions and for any estimation procedure which behaves nicely under the bootstrap.

Abstract

We present a method for producing unbiased parameter estimates and valid confidence intervals under the constraints of differential privacy, a formal framework for limiting individual information leakage from sensitive data. Prior work in this area is limited in that it is tailored to calculating confidence intervals for specific statistical procedures, such as mean estimation or simple linear regression. While other recent work can produce confidence intervals for more general sets of procedures, they either yield only approximately unbiased estimates, are designed for one-dimensional outputs, or assume significant user knowledge about the data-generating distribution. Our method induces distributions of mean and covariance estimates via the bag of little bootstraps (BLB) and uses them to privately estimate the parameters' sampling distribution via a generalized version of the CoinPress estimation algorithm. If the user can bound the parameters of the BLB-induced parameters and provide heavier-tailed families, the algorithm produces unbiased parameter estimates and valid confidence intervals which hold with arbitrarily high probability. These results hold in high dimensions and for any estimation procedure which behaves nicely under the bootstrap.

Unbiased Statistical Estimation and Valid Confidence Intervals Under Differential Privacy

TL;DR

This work presents a method for producing unbiased parameter estimates and valid confidence intervals under the constraints of differential privacy, a formal framework for limiting individual information leakage from sensitive data and results hold in high dimensions and for any estimation procedure which behaves nicely under the bootstrap.

Abstract

We present a method for producing unbiased parameter estimates and valid confidence intervals under the constraints of differential privacy, a formal framework for limiting individual information leakage from sensitive data. Prior work in this area is limited in that it is tailored to calculating confidence intervals for specific statistical procedures, such as mean estimation or simple linear regression. While other recent work can produce confidence intervals for more general sets of procedures, they either yield only approximately unbiased estimates, are designed for one-dimensional outputs, or assume significant user knowledge about the data-generating distribution. Our method induces distributions of mean and covariance estimates via the bag of little bootstraps (BLB) and uses them to privately estimate the parameters' sampling distribution via a generalized version of the CoinPress estimation algorithm. If the user can bound the parameters of the BLB-induced parameters and provide heavier-tailed families, the algorithm produces unbiased parameter estimates and valid confidence intervals which hold with arbitrarily high probability. These results hold in high dimensions and for any estimation procedure which behaves nicely under the bootstrap.

Paper Structure

This paper contains 46 sections, 13 theorems, 55 equations, 7 figures, 2 tables, 4 algorithms.

Key Result

Lemma 2.2

Let $\mathcal{M}: \mathcal{X}^n \rightarrow \mathcal{Y}$ and $\mathcal{M}': \mathcal{X}^n \rightarrow \mathcal{Z}$ such that $\mathcal{M}$ satisfies $\rho$-zCDP and $\mathcal{M}'$ satisfies $\rho'$-zCDP. Define $\mathcal{M}" : \mathcal{X}^n \rightarrow \mathcal{Y} \times \mathcal{Z}$ by $\mathcal{M}

Figures (7)

  • Figure 1: Distribution of OLS coefficient estimates (a-c) and 95% confidence intervals (d-f) under different levels of clipping of $y$. Non-clipped distribution in green, clipped distribution in orange.
  • Figure 2: OLS: Distribution of coefficient estimates and 95% confidence intervals.
  • Figure 3: OLS: BLB estimates from a single run
  • Figure 4: Distribution of coefficient estimates and 95% confidence intervals for $k = 5{,}000, d = 10, \rho = 0.1$ for multivariate Laplace distribution
  • Figure 5: Logistic Regression: Distribution of coefficient estimates and 95% confidence intervals
  • ...and 2 more figures

Theorems & Definitions (31)

  • Definition 2.1: Zero-concentrated differential privacy (zCDP) BS16
  • Lemma 2.2: Composition of zCDP BS16
  • Lemma 2.3: Postprocessing of zCDP BS16
  • Theorem 2.8
  • Theorem 2.9
  • Theorem 2.10
  • Theorem 2.11
  • Theorem 2.12: Confidence Region (valid with high probability)
  • Definition A.1: Neighboring data sets
  • Definition A.2: Rényi divergence Ren61
  • ...and 21 more