Table of Contents
Fetching ...

Detecting Low-Degree Truncation

Anindya De, Huan Li, Shivam Nadimpalli, Rocco A. Servedio

Abstract

We consider the following basic, and very broad, statistical problem: Given a known high-dimensional distribution ${\cal D}$ over $\mathbb{R}^n$ and a collection of data points in $\mathbb{R}^n$, distinguish between the two possibilities that (i) the data was drawn from ${\cal D}$, versus (ii) the data was drawn from ${\cal D}|_S$, i.e. from ${\cal D}$ subject to truncation by an unknown truncation set $S \subseteq \mathbb{R}^n$. We study this problem in the setting where ${\cal D}$ is a high-dimensional i.i.d. product distribution and $S$ is an unknown degree-$d$ polynomial threshold function (one of the most well-studied types of Boolean-valued function over $\mathbb{R}^n$). Our main results are an efficient algorithm when ${\cal D}$ is a hypercontractive distribution, and a matching lower bound: $\bullet$ For any constant $d$, we give a polynomial-time algorithm which successfully distinguishes ${\cal D}$ from ${\cal D}|_S$ using $O(n^{d/2})$ samples (subject to mild technical conditions on ${\cal D}$ and $S$); $\bullet$ Even for the simplest case of ${\cal D}$ being the uniform distribution over $\{+1, -1\}^n$, we show that for any constant $d$, any distinguishing algorithm for degree-$d$ polynomial threshold functions must use $Ω(n^{d/2})$ samples.

Detecting Low-Degree Truncation

Abstract

We consider the following basic, and very broad, statistical problem: Given a known high-dimensional distribution over and a collection of data points in , distinguish between the two possibilities that (i) the data was drawn from , versus (ii) the data was drawn from , i.e. from subject to truncation by an unknown truncation set . We study this problem in the setting where is a high-dimensional i.i.d. product distribution and is an unknown degree- polynomial threshold function (one of the most well-studied types of Boolean-valued function over ). Our main results are an efficient algorithm when is a hypercontractive distribution, and a matching lower bound: For any constant , we give a polynomial-time algorithm which successfully distinguishes from using samples (subject to mild technical conditions on and ); Even for the simplest case of being the uniform distribution over , we show that for any constant , any distinguishing algorithm for degree- polynomial threshold functions must use samples.
Paper Structure (20 sections, 14 theorems, 83 equations, 1 figure, 1 table, 1 algorithm)

This paper contains 20 sections, 14 theorems, 83 equations, 1 figure, 1 table, 1 algorithm.

Key Result

Theorem 3

Let $0 < \varepsilon < 1$. Fix any constant $d$ and any hypercontractive i.i.d. product distribution $\mu^{\otimes n}$ over $\mathbb{R}^n$. Let $f: \mathbb{R}^n \to \{0,1\}$ be an unknown degree-$d$ PTF such that There is an efficient algorithm that uses $\Theta(n^{d/2}/\varepsilon^2)$ samples from ${\cal D}$ and successfully (w.h.p.) distinguishes between the following two cases:

Figures (1)

  • Figure 1: A draw of $\boldsymbol{f}$ from the distribution $\mathcal{F}_d$

Theorems & Definitions (34)

  • Theorem 3: Efficiently detecting PTF truncation, informal theorem statement
  • Theorem 4: Lower bound for detecting PTF truncation, informal theorem statement
  • Definition 5
  • Remark 6
  • Definition 7
  • Remark 8
  • Remark 9
  • Proposition 10: Anti-concentration of low-degree polynomials
  • proof
  • Proposition 11: Level-$k$ inequalities
  • ...and 24 more