Table of Contents
Fetching ...

Private Estimation when Data and Privacy Demands are Correlated

Syomantak Chaudhuri, Thomas A. Courtade

TL;DR

The paper addresses private estimation under heterogeneous privacy demands by formulating central-DP HDP for mean and frequency estimation, with two correlation models: fully correlated adversarial data and weakly-correlated data achieved via random permutation. It introduces HPF and HPM mechanisms that weight users differently and inject Laplace noise calibrated to $\|\,\vec{w}/\bm\epsilon\,\|_{\infty}$, plus a fast heuristic HPF-A; provides PAC and MSE minimax guarantees and proves minimax optimality in several regimes. Theoretical results are complemented by experiments on UC salary data and cancer-type data, showing that HPF-C/HPM-C and HPF-WC/HPM-WC outperform baselines and that HPF-A often yields strong practical performance. The work highlights a fundamental trade-off between privacy heterogeneity and estimation accuracy, demonstrates regimes where correlations do not preclude accurate estimation, and offers efficient, scalable algorithms for real-world deployment with heterogeneous user privacy requirements.

Abstract

Differential Privacy (DP) is the current gold-standard for ensuring privacy for statistical queries. Estimation problems under DP constraints appearing in the literature have largely focused on providing equal privacy to all users. We consider the problems of empirical mean estimation for univariate data and frequency estimation for categorical data, both subject to heterogeneous privacy constraints. Each user, contributing a sample to the dataset, is allowed to have a different privacy demand. The dataset itself is assumed to be worst-case and we study both problems under two different formulations -- first, where privacy demands and data may be correlated, and second, where correlations are weakened by random permutation of the dataset. We establish theoretical performance guarantees for our proposed algorithms, under both PAC error and mean-squared error. These performance guarantees translate to minimax optimality in several instances, and experiments confirm superior performance of our algorithms over other baseline techniques.

Private Estimation when Data and Privacy Demands are Correlated

TL;DR

The paper addresses private estimation under heterogeneous privacy demands by formulating central-DP HDP for mean and frequency estimation, with two correlation models: fully correlated adversarial data and weakly-correlated data achieved via random permutation. It introduces HPF and HPM mechanisms that weight users differently and inject Laplace noise calibrated to , plus a fast heuristic HPF-A; provides PAC and MSE minimax guarantees and proves minimax optimality in several regimes. Theoretical results are complemented by experiments on UC salary data and cancer-type data, showing that HPF-C/HPM-C and HPF-WC/HPM-WC outperform baselines and that HPF-A often yields strong practical performance. The work highlights a fundamental trade-off between privacy heterogeneity and estimation accuracy, demonstrates regimes where correlations do not preclude accurate estimation, and offers efficient, scalable algorithms for real-world deployment with heterogeneous user privacy requirements.

Abstract

Differential Privacy (DP) is the current gold-standard for ensuring privacy for statistical queries. Estimation problems under DP constraints appearing in the literature have largely focused on providing equal privacy to all users. We consider the problems of empirical mean estimation for univariate data and frequency estimation for categorical data, both subject to heterogeneous privacy constraints. Each user, contributing a sample to the dataset, is allowed to have a different privacy demand. The dataset itself is assumed to be worst-case and we study both problems under two different formulations -- first, where privacy demands and data may be correlated, and second, where correlations are weakened by random permutation of the dataset. We establish theoretical performance guarantees for our proposed algorithms, under both PAC error and mean-squared error. These performance guarantees translate to minimax optimality in several instances, and experiments confirm superior performance of our algorithms over other baseline techniques.
Paper Structure (37 sections, 18 theorems, 67 equations, 4 figures, 5 tables, 3 algorithms)

This paper contains 37 sections, 18 theorems, 67 equations, 4 figures, 5 tables, 3 algorithms.

Key Result

Lemma 1

The proposed family of algorithms HPF and HPM satisfy the $\bm\epsilon$-DP constraint defined in Definition def:epsDP.

Figures (4)

  • Figure 1: Heterogeneously Private Frequency (HPF)
  • Figure 1: Performance of proposed algorithms and baselines in two different settings under two different error criteria (higher is better). (a): The negative log MSE for six algorithms for three experiments in the correlated regime is plotted. (b): The negative log of the 95-th empirical error quantile for the six algorithms for three experiments in the weakly-correlated regime is plotted.
  • Figure 2: Comparison of performance of the HPF algorithms in \ref{['alg:ADPF']} and \ref{['alg:BDPF']}.
  • Figure 3: Heterogeneously Private Mean (HPM)

Theorems & Definitions (21)

  • Definition 1: Heterogeneous Differential Privacy
  • Definition 2: Minimax Rates for Frequency Estimation
  • Lemma 1: name=Privacy Guarantee
  • Theorem 1: name=PAC Minimax Optimality in Correlated Setting
  • Theorem 2: name=PAC Minimax Optimality in Weakly-correlated Setting
  • Theorem 3: name=MSE Minimax Optimality in Weaklt-correlated Setting
  • Definition 3: Minimax Rates for Mean Estimation
  • Theorem 4: name=PAC Upper Bound
  • Theorem 5: name=MSE Upper Bound
  • Theorem 6: name=Implicit PAC Lower Bound
  • ...and 11 more