Table of Contents
Fetching ...

Sample-Optimal Locally Private Hypothesis Selection and the Provable Benefits of Interactivity

Alireza F. Pour, Hassan Ashtiani, Shahab Asoodeh

TL;DR

An LDP algorithm that uses the notion of \emph{critical queries} for a Statistical Query Algorithm (SQA) and breaks the known lower bound of $\Omega\left(\frac{k\log k}{\alpha^2\min \{ \varepsilon^2 ,1\}} \right)$ for the sample complexity of non-interactive hypothesis selection.

Abstract

We study the problem of hypothesis selection under the constraint of local differential privacy. Given a class $\mathcal{F}$ of $k$ distributions and a set of i.i.d. samples from an unknown distribution $h$, the goal of hypothesis selection is to pick a distribution $\hat{f}$ whose total variation distance to $h$ is comparable with the best distribution in $\mathcal{F}$ (with high probability). We devise an $\varepsilon$-locally-differentially-private ($\varepsilon$-LDP) algorithm that uses $Θ\left(\frac{k}{α^2\min \{\varepsilon^2,1\}}\right)$ samples to guarantee that $d_{TV}(h,\hat{f})\leq α+ 9 \min_{f\in \mathcal{F}}d_{TV}(h,f)$ with high probability. This sample complexity is optimal for $\varepsilon<1$, matching the lower bound of Gopi et al. (2020). All previously known algorithms for this problem required $Ω\left(\frac{k\log k}{α^2\min \{ \varepsilon^2 ,1\}} \right)$ samples to work. Moreover, our result demonstrates the power of interaction for $\varepsilon$-LDP hypothesis selection. Namely, it breaks the known lower bound of $Ω\left(\frac{k\log k}{α^2\min \{ \varepsilon^2 ,1\}} \right)$ for the sample complexity of non-interactive hypothesis selection. Our algorithm breaks this barrier using only $Θ(\log \log k)$ rounds of interaction. To prove our results, we define the notion of \emph{critical queries} for a Statistical Query Algorithm (SQA) which may be of independent interest. Informally, an SQA is said to use a small number of critical queries if its success relies on the accuracy of only a small number of queries it asks. We then design an LDP algorithm that uses a smaller number of critical queries.

Sample-Optimal Locally Private Hypothesis Selection and the Provable Benefits of Interactivity

TL;DR

An LDP algorithm that uses the notion of \emph{critical queries} for a Statistical Query Algorithm (SQA) and breaks the known lower bound of for the sample complexity of non-interactive hypothesis selection.

Abstract

We study the problem of hypothesis selection under the constraint of local differential privacy. Given a class of distributions and a set of i.i.d. samples from an unknown distribution , the goal of hypothesis selection is to pick a distribution whose total variation distance to is comparable with the best distribution in (with high probability). We devise an -locally-differentially-private (-LDP) algorithm that uses samples to guarantee that with high probability. This sample complexity is optimal for , matching the lower bound of Gopi et al. (2020). All previously known algorithms for this problem required samples to work. Moreover, our result demonstrates the power of interaction for -LDP hypothesis selection. Namely, it breaks the known lower bound of for the sample complexity of non-interactive hypothesis selection. Our algorithm breaks this barrier using only rounds of interaction. To prove our results, we define the notion of \emph{critical queries} for a Statistical Query Algorithm (SQA) which may be of independent interest. Informally, an SQA is said to use a small number of critical queries if its success relies on the accuracy of only a small number of queries it asks. We then design an LDP algorithm that uses a smaller number of critical queries.
Paper Structure (35 sections, 16 theorems, 30 equations, 1 table, 6 algorithms)

This paper contains 35 sections, 16 theorems, 30 equations, 1 table, 6 algorithms.

Key Result

Theorem 2

There exists a family of $k$ distributions for which any (interactive) $\varepsilon$-LDP selection method requires at least $\Omega\left(\frac{k}{\alpha^2 \min\{\varepsilon,\varepsilon^2\}}\right)$ samples to learn it.

Theorems & Definitions (40)

  • Definition 1: Hypothesis Selection
  • Theorem 2: Informal, Theorem 1.2 of gopi2020locally, Corollary 6 of duchi2019lower
  • Theorem 3: Informal, Corollary 5.10 of gopi2020locally
  • Theorem 4: Informal, Theorem 3.3 of gopi2020locally
  • Theorem 5: Informal Version of Theorem \ref{['thm:ours']}
  • Corollary 6: Sample Complexity of LDP Hypothesis Selection
  • Definition 7: Statistical Query Oracle
  • Definition 8: Statistical Query Algorithm
  • Definition 9: Scheffé Set
  • Theorem 10: Analysis of Shceffé Test
  • ...and 30 more