Differential privacy with dependent data
Valentin Roth, Marco Avella-Medina
TL;DR
The paper develops a principled DP framework for dependent data by formalizing dependence via log-Sobolev inequalities and extending the stable histogram–Winsorized mean approach to both item- and user-level DP. It provides finite-sample and in-expectation bounds under log-Sobolev dependence, demonstrates minimax-consistent rates in weakly dependent regimes, and extends the methodology to nonparametric regression and longitudinal settings. The authors further adapt these ideas to the local DP model and present simulations illustrating performance under varying dependence and privacy budgets. Overall, this work lays a foundation for DP with dependent data and offers practical estimators for a range of statistical tasks in social/health sciences and longitudinal studies.
Abstract
Dependent data underlies many statistical studies in the social and health sciences, which often involve sensitive or private information. Differential privacy (DP) and in particular \textit{user-level} DP provide a natural formalization of privacy requirements for processing dependent data where each individual provides multiple observations to the dataset. However, dependence introduced, e.g., through repeated measurements challenges the existing statistical theory under DP-constraints. In \iid{} settings, noisy Winsorized mean estimators have been shown to be minimax optimal for standard (\textit{item-level}) and \textit{user-level} DP estimation of a mean $μ\in \R^d$. Yet, their behavior on potentially dependent observations has not previously been studied. We fill this gap and show that Winsorized mean estimators can also be used under dependence for bounded and unbounded data, and can lead to asymptotic and finite sample guarantees that resemble their \iid{} counterparts under a weak notion of dependence. For this, we formalize dependence via log-Sobolev inequalities on the joint distribution of observations. This enables us to adapt the stable histogram by Karwa and Vadhan (2018) to a non-\iid{} setting, which we then use to estimate the private projection intervals of the Winsorized estimator. The resulting guarantees for our item-level mean estimator extend to \textit{user-level} mean estimation and transfer to the local model via a randomized response histogram. Using the mean estimators as building blocks, we provide extensions to random effects models, longitudinal linear regression and nonparametric regression. Therefore, our work constitutes a first step towards a systematic study of DP for dependent data.
