Table of Contents
Fetching ...

Differentially Private Finite Population Estimation via Survey Weight Regularization

Jeremy Seeman, Yajuan Si, Jerome P Reiter

TL;DR

This work shows that optimal strategies for releasing DP survey-weighted mean income estimates require orders-of-magnitude less noise than naively using the original survey weights without modification, and develops a differentially private method for estimating finite population quantities.

Abstract

In general, it is challenging to release differentially private versions of survey-weighted statistics with low error for acceptable privacy loss. This is because weighted statistics from complex sample survey data can be more sensitive to individual survey response and weight values than unweighted statistics, resulting in differentially private mechanisms that can add substantial noise to the unbiased estimate of the finite population quantity. On the other hand, simply disregarding the survey weights adds noise to a biased estimator, which also can result in an inaccurate estimate. Thus, the problem of releasing an accurate survey-weighted estimate essentially involves a trade-off among bias, precision, and privacy. We leverage this trade-off to develop a differentially private method for estimating finite population quantities. The key step is to privately estimate a hyperparameter that determines how much to regularize or shrink survey weights as a function of privacy loss. We illustrate the differentially private finite population estimation using the Panel Study of Income Dynamics. We show that optimal strategies for releasing DP survey-weighted mean income estimates require orders-of-magnitude less noise than naively using the original survey weights without modification.

Differentially Private Finite Population Estimation via Survey Weight Regularization

TL;DR

This work shows that optimal strategies for releasing DP survey-weighted mean income estimates require orders-of-magnitude less noise than naively using the original survey weights without modification, and develops a differentially private method for estimating finite population quantities.

Abstract

In general, it is challenging to release differentially private versions of survey-weighted statistics with low error for acceptable privacy loss. This is because weighted statistics from complex sample survey data can be more sensitive to individual survey response and weight values than unweighted statistics, resulting in differentially private mechanisms that can add substantial noise to the unbiased estimate of the finite population quantity. On the other hand, simply disregarding the survey weights adds noise to a biased estimator, which also can result in an inaccurate estimate. Thus, the problem of releasing an accurate survey-weighted estimate essentially involves a trade-off among bias, precision, and privacy. We leverage this trade-off to develop a differentially private method for estimating finite population quantities. The key step is to privately estimate a hyperparameter that determines how much to regularize or shrink survey weights as a function of privacy loss. We illustrate the differentially private finite population estimation using the Panel Study of Income Dynamics. We show that optimal strategies for releasing DP survey-weighted mean income estimates require orders-of-magnitude less noise than naively using the original survey weights without modification.

Paper Structure

This paper contains 17 sections, 6 theorems, 48 equations, 10 figures, 1 table, 2 algorithms.

Key Result

Lemma 1

Consider the loss function $\ell(\lambda; \bm{y}, \bm{w})$ in eq:loss_tri. As a function of $\lambda$, the $\ell(\lambda; \bm{y}, \bm{w})$ is minimized by

Figures (10)

  • Figure 1: Minimum feasible values for $\left| \hat{\theta}_0 - \hat{\theta} \right|$ as a function of sample size $n$, weight ratio $U_W n / N$, and privacy loss $\rho$, where $\theta$ is the mean of a binary variable in a population of size $N=10^8$.
  • Figure 2: Plot of survey weights (x-axis) and $\texttt{inc3}$ (y-axis), with univariate histograms on the margins and a spline estimate of the central tendency in blue.
  • Figure 3: Histograms of logarithm transformed survey weights for respondents above and below the 2019 poverty lines (left and right, respectively).
  • Figure 4: Theoretical minimum AWD for which $\lambda^* < 1$ (y-axis) i.e., survey weighting design is not ignorable under $\rho$-zCDP, as a function of sample size $n$ (colored lines) and privacy loss budget $\rho$. Subplots and horizontal dashed lines refer to realized AWDs for two variables: inc3 (left) and pov (right).
  • Figure 5: Realized noise-to-signal (DP mean square error divided by non-DP mean estimate, y-axis) as a function of $\lambda$ (x-axis) for different values of privacy loss budget $\rho_2$ (colored lines). Subplots are ordered with decreasing correlation between response variable and survey weights. Points refer to theoretical minimum values, which depend on confidential data and do not satisfy DP.
  • ...and 5 more figures

Theorems & Definitions (13)

  • Definition 1: Adjacency
  • Definition 2: $\rho$-zero-concentrated differential privacy bun_concentrated_2016
  • Definition 3: Gaussian Mechanism bun_concentrated_2016
  • Definition 4: Exponential Mechanism mcsherry_mechanism_2007
  • Lemma 1
  • Theorem 1
  • Theorem 2
  • proof
  • Theorem 3
  • proof
  • ...and 3 more