Table of Contents
Fetching ...

Differentially Private Release and Learning of Threshold Functions

Mark Bun, Kobbi Nissim, Uri Stemmer, Salil Vadhan

TL;DR

The paper establishes new bounds for differential privacy regarding three core tasks—releasing threshold functions, distribution learning under Kolmogorov distance, and private PAC learning of thresholds. By reducing to the interior-point problem and leveraging recursive constructions and fingerprinting-code techniques, it proves a lower bound of $n = \Omega(\log^*|X|)$ on infinite domains and provides a new upper bound of $n = 2^{(1+o(1))\log^*|X|}$, improving previous results. It further shows that the lower bound extends to $\ell$-dimensional thresholds, yielding a $\Omega(\ell \cdot \log^*|X|)$ bound for properly learning thresholds in $\ell$ dimensions, and establishes a fundamental separation between private and non-private learning in this setting. The work also clarifies equivalences between threshold release, distribution learning, and the interior-point problem, offering a unified framework and several open questions on tightening the gap between lower and upper bounds and on constructing practical DP algorithms for these tasks.

Abstract

We prove new upper and lower bounds on the sample complexity of $(ε, δ)$ differentially private algorithms for releasing approximate answers to threshold functions. A threshold function $c_x$ over a totally ordered domain $X$ evaluates to $c_x(y) = 1$ if $y \le x$, and evaluates to $0$ otherwise. We give the first nontrivial lower bound for releasing thresholds with $(ε,δ)$ differential privacy, showing that the task is impossible over an infinite domain $X$, and moreover requires sample complexity $n \ge Ω(\log^*|X|)$, which grows with the size of the domain. Inspired by the techniques used to prove this lower bound, we give an algorithm for releasing thresholds with $n \le 2^{(1+ o(1))\log^*|X|}$ samples. This improves the previous best upper bound of $8^{(1 + o(1))\log^*|X|}$ (Beimel et al., RANDOM '13). Our sample complexity upper and lower bounds also apply to the tasks of learning distributions with respect to Kolmogorov distance and of properly PAC learning thresholds with differential privacy. The lower bound gives the first separation between the sample complexity of properly learning a concept class with $(ε,δ)$ differential privacy and learning without privacy. For properly learning thresholds in $\ell$ dimensions, this lower bound extends to $n \ge Ω(\ell \cdot \log^*|X|)$. To obtain our results, we give reductions in both directions from releasing and properly learning thresholds and the simpler interior point problem. Given a database $D$ of elements from $X$, the interior point problem asks for an element between the smallest and largest elements in $D$. We introduce new recursive constructions for bounding the sample complexity of the interior point problem, as well as further reductions and techniques for proving impossibility results for other basic problems in differential privacy.

Differentially Private Release and Learning of Threshold Functions

TL;DR

The paper establishes new bounds for differential privacy regarding three core tasks—releasing threshold functions, distribution learning under Kolmogorov distance, and private PAC learning of thresholds. By reducing to the interior-point problem and leveraging recursive constructions and fingerprinting-code techniques, it proves a lower bound of on infinite domains and provides a new upper bound of , improving previous results. It further shows that the lower bound extends to -dimensional thresholds, yielding a bound for properly learning thresholds in dimensions, and establishes a fundamental separation between private and non-private learning in this setting. The work also clarifies equivalences between threshold release, distribution learning, and the interior-point problem, offering a unified framework and several open questions on tightening the gap between lower and upper bounds and on constructing practical DP algorithms for these tasks.

Abstract

We prove new upper and lower bounds on the sample complexity of differentially private algorithms for releasing approximate answers to threshold functions. A threshold function over a totally ordered domain evaluates to if , and evaluates to otherwise. We give the first nontrivial lower bound for releasing thresholds with differential privacy, showing that the task is impossible over an infinite domain , and moreover requires sample complexity , which grows with the size of the domain. Inspired by the techniques used to prove this lower bound, we give an algorithm for releasing thresholds with samples. This improves the previous best upper bound of (Beimel et al., RANDOM '13). Our sample complexity upper and lower bounds also apply to the tasks of learning distributions with respect to Kolmogorov distance and of properly PAC learning thresholds with differential privacy. The lower bound gives the first separation between the sample complexity of properly learning a concept class with differential privacy and learning without privacy. For properly learning thresholds in dimensions, this lower bound extends to . To obtain our results, we give reductions in both directions from releasing and properly learning thresholds and the simpler interior point problem. Given a database of elements from , the interior point problem asks for an element between the smallest and largest elements in . We introduce new recursive constructions for bounding the sample complexity of the interior point problem, as well as further reductions and techniques for proving impossibility results for other basic problems in differential privacy.

Paper Structure

This paper contains 37 sections, 41 theorems, 59 equations, 5 algorithms.

Key Result

Theorem 1.2

The sample complexity of releasing threshold functions over a data universe $X$ with differential privacy is at least $\Omega(\log^* |X|)$. In particular, there is no differentially private algorithm for releasing threshold functions over an infinite data universe.

Theorems & Definitions (86)

  • Definition 1.1
  • Theorem 1.2
  • Theorem 1.3
  • Theorem 1.4
  • Theorem 1.5
  • Theorem 1.6
  • Theorem 1.7
  • Theorem 1.8
  • Theorem 1.9
  • Theorem 2.1: The Laplace Mechanism DworkMcNiSm06
  • ...and 76 more