Table of Contents
Fetching ...

Missing Mass for Differentially Private Domain Discovery

Travis Dick, Matthew Joseph, Vinod Raman

Abstract

We study several problems in differentially private domain discovery, where each user holds a subset of items from a shared but unknown domain, and the goal is to output an informative subset of items. For set union, we show that the simple baseline Weighted Gaussian Mechanism (WGM) has a near-optimal $\ell_1$ missing mass guarantee on Zipfian data as well as a distribution-free $\ell_\infty$ missing mass guarantee. We then apply the WGM as a domain-discovery precursor for existing known-domain algorithms for private top-$k$ and $k$-hitting set and obtain new utility guarantees for their unknown domain variants. Finally, experiments demonstrate that all of our WGM-based methods are competitive with or outperform existing baselines for all three problems.

Missing Mass for Differentially Private Domain Discovery

Abstract

We study several problems in differentially private domain discovery, where each user holds a subset of items from a shared but unknown domain, and the goal is to output an informative subset of items. For set union, we show that the simple baseline Weighted Gaussian Mechanism (WGM) has a near-optimal missing mass guarantee on Zipfian data as well as a distribution-free missing mass guarantee. We then apply the WGM as a domain-discovery precursor for existing known-domain algorithms for private top- and -hitting set and obtain new utility guarantees for their unknown domain variants. Finally, experiments demonstrate that all of our WGM-based methods are competitive with or outperform existing baselines for all three problems.
Paper Structure (40 sections, 24 theorems, 96 equations, 13 figures, 1 table, 4 algorithms)

This paper contains 40 sections, 24 theorems, 96 equations, 13 figures, 1 table, 4 algorithms.

Key Result

Lemma 3.1

Let $W$ be any $(C, s)$-Zipfian dataset . Then, $\max_i |W_i| \leq (CN)^{1/s}.$

Figures (13)

  • Figure 1: Set Union MM as a function of $\Delta_0$. Note that lower is better.
  • Figure 2: Top-$k$ MM as a function of $k$, using $\Delta_0 = 100$.
  • Figure 3: Number of missed users as a function of $k$, using $\Delta_0 = 100$.
  • Figure 4: Log-log plot of frequency vs. rank for large datasets
  • Figure 5: Log-log plot of frequency vs. rank for small datasets
  • ...and 8 more figures

Theorems & Definitions (46)

  • Definition 2.1: DMNS06
  • Definition 2.2
  • Definition 3.1: $(C, s)$-Zipfian
  • Lemma 3.1
  • Theorem 3.2: Theorem 5.1 gopi2020differentially
  • Theorem 3.3
  • Corollary 3.4
  • Theorem 3.5
  • Theorem 3.6
  • Definition 4.1
  • ...and 36 more