Private Estimation when Data and Privacy Demands are Correlated
Syomantak Chaudhuri, Thomas A. Courtade
TL;DR
The paper addresses private estimation under heterogeneous privacy demands by formulating central-DP HDP for mean and frequency estimation, with two correlation models: fully correlated adversarial data and weakly-correlated data achieved via random permutation. It introduces HPF and HPM mechanisms that weight users differently and inject Laplace noise calibrated to $\|\,\vec{w}/\bm\epsilon\,\|_{\infty}$, plus a fast heuristic HPF-A; provides PAC and MSE minimax guarantees and proves minimax optimality in several regimes. Theoretical results are complemented by experiments on UC salary data and cancer-type data, showing that HPF-C/HPM-C and HPF-WC/HPM-WC outperform baselines and that HPF-A often yields strong practical performance. The work highlights a fundamental trade-off between privacy heterogeneity and estimation accuracy, demonstrates regimes where correlations do not preclude accurate estimation, and offers efficient, scalable algorithms for real-world deployment with heterogeneous user privacy requirements.
Abstract
Differential Privacy (DP) is the current gold-standard for ensuring privacy for statistical queries. Estimation problems under DP constraints appearing in the literature have largely focused on providing equal privacy to all users. We consider the problems of empirical mean estimation for univariate data and frequency estimation for categorical data, both subject to heterogeneous privacy constraints. Each user, contributing a sample to the dataset, is allowed to have a different privacy demand. The dataset itself is assumed to be worst-case and we study both problems under two different formulations -- first, where privacy demands and data may be correlated, and second, where correlations are weakened by random permutation of the dataset. We establish theoretical performance guarantees for our proposed algorithms, under both PAC error and mean-squared error. These performance guarantees translate to minimax optimality in several instances, and experiments confirm superior performance of our algorithms over other baseline techniques.
