Strict Optimality of Frequency Estimation Under Local Differential Privacy

Mingen Pan

Strict Optimality of Frequency Estimation Under Local Differential Privacy

Mingen Pan

Abstract

This paper establishes the strict optimality in precision for frequency estimation under local differential privacy (LDP). We prove that a frequency estimator with a symmetric and extremal configuration, and a constant support size equal to an optimized value, is sufficient to achieve maximum precision. Furthermore, we derive that the communication cost of such an optimal estimator can be as low as $\log_2(\frac{d(d-1)}{2}+1)$, where $d$ denotes the dictionary size, and propose an algorithm to generate this optimal estimator. In addition, we introduce a modified Count-Mean Sketch and demonstrate that it is practically indistinguishable from theoretical optimality with a sufficiently large dictionary size (e.g., $d=100$ for a privacy factor of $ε= 1$). We compare existing methods with our proposed optimal estimator to provide selection guidelines for practical deployment. Finally, the performance of these estimators is evaluated experimentally, showing that the empirical results are consistent with our theoretical derivations.

Strict Optimality of Frequency Estimation Under Local Differential Privacy

Abstract

, where

denotes the dictionary size, and propose an algorithm to generate this optimal estimator. In addition, we introduce a modified Count-Mean Sketch and demonstrate that it is practically indistinguishable from theoretical optimality with a sufficiently large dictionary size (e.g.,

for a privacy factor of

). We compare existing methods with our proposed optimal estimator to provide selection guidelines for practical deployment. Finally, the performance of these estimators is evaluated experimentally, showing that the empirical results are consistent with our theoretical derivations.

Paper Structure (32 sections, 30 theorems, 91 equations, 1 figure, 1 table)

This paper contains 32 sections, 30 theorems, 91 equations, 1 figure, 1 table.

Introduction
Previous Work
Background
Local Differential Privacy (LDP)
Frequency Estimation
Extremal Configuration
Symmetric Configuration
Strict Optimality in Precision
Communication Cost
Optimal Algorithms
Subset Selection
Optimized Count Mean Sketch
Weighted Subset Selection
Choosing an Optimal Algorithm
Experiment
...and 17 more sections

Key Result

Proposition 1

Denote $d$, $\epsilon$, $n$, and $\hat{f}$ as dictionary size (number of all possible inputs), privacy factor, dataset size, and frequency estimator, respectively. When $d \ge e^{\epsilon} + 1 ,$ we have Otherwise,

Figures (1)

Figure 1: $\mathcal{L}_1$ and $\mathcal{L}_2$ losses vs. privacy factor $\epsilon$ given the Zipf dataset. See Section \ref{['sec:zipf']} for details.

Theorems & Definitions (31)

Proposition 1
Theorem 2.1
Theorem 2.2
Theorem 2.3
Lemma 2.1
Theorem 2.4
Theorem 2.5
Claim 2.1
Lemma 2.2
Theorem 2.6
...and 21 more

Strict Optimality of Frequency Estimation Under Local Differential Privacy

Abstract

Strict Optimality of Frequency Estimation Under Local Differential Privacy

Authors

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (31)