Differentially Private Confidence Intervals for Proportions under Stratified Random Sampling
Shurong Lin, Mark Bun, Marco Gaboardi, Eric D. Kolaczyk, Adam Smith
TL;DR
The paper addresses privately releasing confidence intervals for population proportions under stratified sampling by introducing three DP CI algorithms under two adjacency notions appropriate for stratified designs. It leverages Gaussian mechanisms with ρ-zCDP to privatize either stratum-level or overall estimates, and extends to private sample sizes with StrNz-PrivSz, employing conditional moments and Taylor expansions to handle ratio-based estimators. Theoretical results establish privacy guarantees and asymptotic coverage, while extensive simulations and two 1940 Census applications demonstrate how privacy budgets affect interval width and coverage, offering practical guidance on method selection. Overall, the work advances design-based differential privacy for survey inference and informs practitioners on balancing privacy with interval precision in public-data contexts.
Abstract
Confidence intervals are a fundamental tool for quantifying the uncertainty of parameters of interest. With the increase of data privacy awareness, developing a private version of confidence intervals has gained growing attention from both statisticians and computer scientists. Differential privacy is a state-of-the-art framework for analyzing privacy loss when releasing statistics computed from sensitive data. Recent work has been done around differentially private confidence intervals, yet to the best of our knowledge, rigorous methodologies on differentially private confidence intervals in the context of survey sampling have not been studied. In this paper, we propose three differentially private algorithms for constructing confidence intervals for proportions under stratified random sampling. We articulate two variants of differential privacy that make sense for data from stratified sampling designs, analyzing each of our algorithms within one of these two variants. We establish analytical privacy guarantees and asymptotic properties of the estimators. In addition, we conduct simulation studies to evaluate the proposed private confidence intervals, and two applications to the 1940 Census data are provided.
