Private Estimation when Data and Privacy Demands are Correlated

Syomantak Chaudhuri; Thomas A. Courtade

Private Estimation when Data and Privacy Demands are Correlated

Syomantak Chaudhuri, Thomas A. Courtade

TL;DR

The paper addresses private estimation under heterogeneous privacy demands by formulating central-DP HDP for mean and frequency estimation, with two correlation models: fully correlated adversarial data and weakly-correlated data achieved via random permutation. It introduces HPF and HPM mechanisms that weight users differently and inject Laplace noise calibrated to $\|\,\vec{w}/\bm\epsilon\,\|_{\infty}$, plus a fast heuristic HPF-A; provides PAC and MSE minimax guarantees and proves minimax optimality in several regimes. Theoretical results are complemented by experiments on UC salary data and cancer-type data, showing that HPF-C/HPM-C and HPF-WC/HPM-WC outperform baselines and that HPF-A often yields strong practical performance. The work highlights a fundamental trade-off between privacy heterogeneity and estimation accuracy, demonstrates regimes where correlations do not preclude accurate estimation, and offers efficient, scalable algorithms for real-world deployment with heterogeneous user privacy requirements.

Abstract

Differential Privacy (DP) is the current gold-standard for ensuring privacy for statistical queries. Estimation problems under DP constraints appearing in the literature have largely focused on providing equal privacy to all users. We consider the problems of empirical mean estimation for univariate data and frequency estimation for categorical data, both subject to heterogeneous privacy constraints. Each user, contributing a sample to the dataset, is allowed to have a different privacy demand. The dataset itself is assumed to be worst-case and we study both problems under two different formulations -- first, where privacy demands and data may be correlated, and second, where correlations are weakened by random permutation of the dataset. We establish theoretical performance guarantees for our proposed algorithms, under both PAC error and mean-squared error. These performance guarantees translate to minimax optimality in several instances, and experiments confirm superior performance of our algorithms over other baseline techniques.

Private Estimation when Data and Privacy Demands are Correlated

TL;DR

, plus a fast heuristic HPF-A; provides PAC and MSE minimax guarantees and proves minimax optimality in several regimes. Theoretical results are complemented by experiments on UC salary data and cancer-type data, showing that HPF-C/HPM-C and HPF-WC/HPM-WC outperform baselines and that HPF-A often yields strong practical performance. The work highlights a fundamental trade-off between privacy heterogeneity and estimation accuracy, demonstrates regimes where correlations do not preclude accurate estimation, and offers efficient, scalable algorithms for real-world deployment with heterogeneous user privacy requirements.

Abstract

Paper Structure (37 sections, 18 theorems, 67 equations, 4 figures, 5 tables, 3 algorithms)

This paper contains 37 sections, 18 theorems, 67 equations, 4 figures, 5 tables, 3 algorithms.

Introduction
Our Contribution and Problem Description
Related Work
Problem Definition
Notation
Problem Definition
Modeling Choice & Motivation
Algorithm Description
Representative Experiments
Performance Analysis
Minimax Optimality in Correlated Regime
Minimax Optimality in Weakly-correlated Regime
Extended Discussion
Future Work: Alternate threat models for HDP
Mean Estimation: Definition and HPM Mechanism
...and 22 more sections

Key Result

Lemma 1

The proposed family of algorithms HPF and HPM satisfy the $\bm\epsilon$-DP constraint defined in Definition def:epsDP.

Figures (4)

Figure 1: Heterogeneously Private Frequency (HPF)
Figure 1: Performance of proposed algorithms and baselines in two different settings under two different error criteria (higher is better). (a): The negative log MSE for six algorithms for three experiments in the correlated regime is plotted. (b): The negative log of the 95-th empirical error quantile for the six algorithms for three experiments in the weakly-correlated regime is plotted.
Figure 2: Comparison of performance of the HPF algorithms in \ref{['alg:ADPF']} and \ref{['alg:BDPF']}.
Figure 3: Heterogeneously Private Mean (HPM)

Theorems & Definitions (21)

Definition 1: Heterogeneous Differential Privacy
Definition 2: Minimax Rates for Frequency Estimation
Lemma 1: name=Privacy Guarantee
Theorem 1: name=PAC Minimax Optimality in Correlated Setting
Theorem 2: name=PAC Minimax Optimality in Weakly-correlated Setting
Theorem 3: name=MSE Minimax Optimality in Weaklt-correlated Setting
Definition 3: Minimax Rates for Mean Estimation
Theorem 4: name=PAC Upper Bound
Theorem 5: name=MSE Upper Bound
Theorem 6: name=Implicit PAC Lower Bound
...and 11 more

Private Estimation when Data and Privacy Demands are Correlated

TL;DR

Abstract

Private Estimation when Data and Privacy Demands are Correlated

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (21)