Table of Contents
Fetching ...

Differential privacy with dependent data

Valentin Roth, Marco Avella-Medina

TL;DR

The paper develops a principled DP framework for dependent data by formalizing dependence via log-Sobolev inequalities and extending the stable histogram–Winsorized mean approach to both item- and user-level DP. It provides finite-sample and in-expectation bounds under log-Sobolev dependence, demonstrates minimax-consistent rates in weakly dependent regimes, and extends the methodology to nonparametric regression and longitudinal settings. The authors further adapt these ideas to the local DP model and present simulations illustrating performance under varying dependence and privacy budgets. Overall, this work lays a foundation for DP with dependent data and offers practical estimators for a range of statistical tasks in social/health sciences and longitudinal studies.

Abstract

Dependent data underlies many statistical studies in the social and health sciences, which often involve sensitive or private information. Differential privacy (DP) and in particular \textit{user-level} DP provide a natural formalization of privacy requirements for processing dependent data where each individual provides multiple observations to the dataset. However, dependence introduced, e.g., through repeated measurements challenges the existing statistical theory under DP-constraints. In \iid{} settings, noisy Winsorized mean estimators have been shown to be minimax optimal for standard (\textit{item-level}) and \textit{user-level} DP estimation of a mean $μ\in \R^d$. Yet, their behavior on potentially dependent observations has not previously been studied. We fill this gap and show that Winsorized mean estimators can also be used under dependence for bounded and unbounded data, and can lead to asymptotic and finite sample guarantees that resemble their \iid{} counterparts under a weak notion of dependence. For this, we formalize dependence via log-Sobolev inequalities on the joint distribution of observations. This enables us to adapt the stable histogram by Karwa and Vadhan (2018) to a non-\iid{} setting, which we then use to estimate the private projection intervals of the Winsorized estimator. The resulting guarantees for our item-level mean estimator extend to \textit{user-level} mean estimation and transfer to the local model via a randomized response histogram. Using the mean estimators as building blocks, we provide extensions to random effects models, longitudinal linear regression and nonparametric regression. Therefore, our work constitutes a first step towards a systematic study of DP for dependent data.

Differential privacy with dependent data

TL;DR

The paper develops a principled DP framework for dependent data by formalizing dependence via log-Sobolev inequalities and extending the stable histogram–Winsorized mean approach to both item- and user-level DP. It provides finite-sample and in-expectation bounds under log-Sobolev dependence, demonstrates minimax-consistent rates in weakly dependent regimes, and extends the methodology to nonparametric regression and longitudinal settings. The authors further adapt these ideas to the local DP model and present simulations illustrating performance under varying dependence and privacy budgets. Overall, this work lays a foundation for DP with dependent data and offers practical estimators for a range of statistical tasks in social/health sciences and longitudinal studies.

Abstract

Dependent data underlies many statistical studies in the social and health sciences, which often involve sensitive or private information. Differential privacy (DP) and in particular \textit{user-level} DP provide a natural formalization of privacy requirements for processing dependent data where each individual provides multiple observations to the dataset. However, dependence introduced, e.g., through repeated measurements challenges the existing statistical theory under DP-constraints. In \iid{} settings, noisy Winsorized mean estimators have been shown to be minimax optimal for standard (\textit{item-level}) and \textit{user-level} DP estimation of a mean . Yet, their behavior on potentially dependent observations has not previously been studied. We fill this gap and show that Winsorized mean estimators can also be used under dependence for bounded and unbounded data, and can lead to asymptotic and finite sample guarantees that resemble their \iid{} counterparts under a weak notion of dependence. For this, we formalize dependence via log-Sobolev inequalities on the joint distribution of observations. This enables us to adapt the stable histogram by Karwa and Vadhan (2018) to a non-\iid{} setting, which we then use to estimate the private projection intervals of the Winsorized estimator. The resulting guarantees for our item-level mean estimator extend to \textit{user-level} mean estimation and transfer to the local model via a randomized response histogram. Using the mean estimators as building blocks, we provide extensions to random effects models, longitudinal linear regression and nonparametric regression. Therefore, our work constitutes a first step towards a systematic study of DP for dependent data.

Paper Structure

This paper contains 64 sections, 60 theorems, 240 equations, 6 figures, 14 algorithms.

Key Result

Theorem 2.3

(Theorem 3.6, dwork:roth2014). Algorithm alg:lapmech is $(\varepsilon, 0)$-DP.

Figures (6)

  • Figure 1: Comparison of the empirical mean $\bar{X}_n$ with Algorithm \ref{['alg:winsmean1D']}, denoted $\mathcal{A}_{(\varepsilon, \delta)}$.
  • Figure 2: Comparison of $(0.1, 1/n^2)$-DP estimators $\mathcal{A}_h^\kappa$ where $\kappa$ is the radius of the projection interval and $h$ the length of the histogram bins. Note that here $\tau^\prime \asymp \sqrt{\log(2/\gamma)}$ and $\tau \asymp \sqrt{\log(2n/\gamma)}$.
  • Figure 3: $(\varepsilon, \delta)$-DP estimators $\mathcal{A}_{(\varepsilon, \delta)}$ on $X^n \in \mathbb{R}^n$ with different covariances $\Sigma^n \in \mathbb{R}^{n\times n}$.
  • Figure 4: Comparison of $(\varepsilon, \delta)$-DP and LDP estimators $\mathcal{A}^c_{(\varepsilon, \delta)}$ and $\mathcal{A}^l_{(\varepsilon, \delta)}$.
  • Figure 5: Comparison of the empirical mean $\bar{X}_n$ with $(1, 1/n^2)$-DP estimator $\mathcal{A}$ using known variance and $(2, 1/n^2)$-DP estimators $\mathcal{A}_{bi}$ and $\mathcal{A}_{coin}$ using plug-in variance estimators.
  • ...and 1 more figures

Theorems & Definitions (133)

  • Definition 2.1
  • Remark 2.2
  • Theorem 2.3
  • Theorem 2.4
  • Definition 2.5
  • Theorem 2.6
  • Remark 2.9
  • Corollary 2.10
  • Lemma 2.11
  • Lemma 3.1
  • ...and 123 more