Table of Contents
Fetching ...

Differentially Private Confidence Intervals for Proportions under Stratified Random Sampling

Shurong Lin, Mark Bun, Marco Gaboardi, Eric D. Kolaczyk, Adam Smith

TL;DR

The paper addresses privately releasing confidence intervals for population proportions under stratified sampling by introducing three DP CI algorithms under two adjacency notions appropriate for stratified designs. It leverages Gaussian mechanisms with ρ-zCDP to privatize either stratum-level or overall estimates, and extends to private sample sizes with StrNz-PrivSz, employing conditional moments and Taylor expansions to handle ratio-based estimators. Theoretical results establish privacy guarantees and asymptotic coverage, while extensive simulations and two 1940 Census applications demonstrate how privacy budgets affect interval width and coverage, offering practical guidance on method selection. Overall, the work advances design-based differential privacy for survey inference and informs practitioners on balancing privacy with interval precision in public-data contexts.

Abstract

Confidence intervals are a fundamental tool for quantifying the uncertainty of parameters of interest. With the increase of data privacy awareness, developing a private version of confidence intervals has gained growing attention from both statisticians and computer scientists. Differential privacy is a state-of-the-art framework for analyzing privacy loss when releasing statistics computed from sensitive data. Recent work has been done around differentially private confidence intervals, yet to the best of our knowledge, rigorous methodologies on differentially private confidence intervals in the context of survey sampling have not been studied. In this paper, we propose three differentially private algorithms for constructing confidence intervals for proportions under stratified random sampling. We articulate two variants of differential privacy that make sense for data from stratified sampling designs, analyzing each of our algorithms within one of these two variants. We establish analytical privacy guarantees and asymptotic properties of the estimators. In addition, we conduct simulation studies to evaluate the proposed private confidence intervals, and two applications to the 1940 Census data are provided.

Differentially Private Confidence Intervals for Proportions under Stratified Random Sampling

TL;DR

The paper addresses privately releasing confidence intervals for population proportions under stratified sampling by introducing three DP CI algorithms under two adjacency notions appropriate for stratified designs. It leverages Gaussian mechanisms with ρ-zCDP to privatize either stratum-level or overall estimates, and extends to private sample sizes with StrNz-PrivSz, employing conditional moments and Taylor expansions to handle ratio-based estimators. Theoretical results establish privacy guarantees and asymptotic coverage, while extensive simulations and two 1940 Census applications demonstrate how privacy budgets affect interval width and coverage, offering practical guidance on method selection. Overall, the work advances design-based differential privacy for survey inference and informs practitioners on balancing privacy with interval precision in public-data contexts.

Abstract

Confidence intervals are a fundamental tool for quantifying the uncertainty of parameters of interest. With the increase of data privacy awareness, developing a private version of confidence intervals has gained growing attention from both statisticians and computer scientists. Differential privacy is a state-of-the-art framework for analyzing privacy loss when releasing statistics computed from sensitive data. Recent work has been done around differentially private confidence intervals, yet to the best of our knowledge, rigorous methodologies on differentially private confidence intervals in the context of survey sampling have not been studied. In this paper, we propose three differentially private algorithms for constructing confidence intervals for proportions under stratified random sampling. We articulate two variants of differential privacy that make sense for data from stratified sampling designs, analyzing each of our algorithms within one of these two variants. We establish analytical privacy guarantees and asymptotic properties of the estimators. In addition, we conduct simulation studies to evaluate the proposed private confidence intervals, and two applications to the 1940 Census data are provided.
Paper Structure (30 sections, 15 theorems, 106 equations, 4 figures, 6 tables, 3 algorithms)

This paper contains 30 sections, 15 theorems, 106 equations, 4 figures, 6 tables, 3 algorithms.

Key Result

Proposition 1

Let $q: {\cal X}^* \rightarrow \mathbb{R}$ be a sensitivity-$\Delta$ query. Consider the mechanism $M: {\cal X}^* \rightarrow \mathbb{R}$ that on input $x$, releases a sample from $N(q(x), \Delta^2/(2\rho))$. Then, $M$ satisfies $\rho$-zCDP.

Figures (4)

  • Figure 1: Q-Q plots: Theoretical versus sample distributions of $\tilde{p}$ with 20 strata and $p = 0.505$ (resulting from $p_h \sim \textup{Uniform}(0.4, 0.6)$), based on 10,000 repetitions each.
  • Figure 2: Setup: 20 strata and $p = 0.505$ ($p_h$$\sim$Uniform(0.4, 0.6)) with 10,000 repetitions. Figure (a) is the empirical coverage with the black solid line indicating the nominal confidence level of 90%. Error bars of one standard deviation are shown for coverage. The average width and width ratio are displayed in (b) with the non-private as the benchmark. Error bars of width are not visible in the plots and therefore not shown.
  • Figure 3: The empirical coverage with error bars, average width and width ratio of DP-CIs of the unemployment rate.
  • Figure 4: The empirical coverage with error bars, average width and width ratio of DP-CIs of the difference of the above-national-income-level proportions between black and white males with valid income values.

Theorems & Definitions (31)

  • Definition 1: $\rho$-zCDP
  • Definition 2: Sensitivity
  • Proposition 1: Gaussian Mechanism of $\rho$-zCDP
  • Proposition 2: Composition
  • Proposition 3: Post-processing
  • Remark 1
  • Remark 2
  • Theorem 3.1: Conditional mean and variance of a reciprocal normal distribution
  • Theorem 4.1: Privacy Guarantee
  • Theorem 4.2: Algorithm \ref{['alg:strlevel']}
  • ...and 21 more