Detecting Low-Degree Truncation

Anindya De; Huan Li; Shivam Nadimpalli; Rocco A. Servedio

Detecting Low-Degree Truncation

Anindya De, Huan Li, Shivam Nadimpalli, Rocco A. Servedio

Abstract

We consider the following basic, and very broad, statistical problem: Given a known high-dimensional distribution ${\cal D}$ over $\mathbb{R}^n$ and a collection of data points in $\mathbb{R}^n$, distinguish between the two possibilities that (i) the data was drawn from ${\cal D}$, versus (ii) the data was drawn from ${\cal D}|_S$, i.e. from ${\cal D}$ subject to truncation by an unknown truncation set $S \subseteq \mathbb{R}^n$. We study this problem in the setting where ${\cal D}$ is a high-dimensional i.i.d. product distribution and $S$ is an unknown degree-$d$ polynomial threshold function (one of the most well-studied types of Boolean-valued function over $\mathbb{R}^n$). Our main results are an efficient algorithm when ${\cal D}$ is a hypercontractive distribution, and a matching lower bound: $\bullet$ For any constant $d$, we give a polynomial-time algorithm which successfully distinguishes ${\cal D}$ from ${\cal D}|_S$ using $O(n^{d/2})$ samples (subject to mild technical conditions on ${\cal D}$ and $S$); $\bullet$ Even for the simplest case of ${\cal D}$ being the uniform distribution over $\{+1, -1\}^n$, we show that for any constant $d$, any distinguishing algorithm for degree-$d$ polynomial threshold functions must use $Ω(n^{d/2})$ samples.

Detecting Low-Degree Truncation

Abstract

We consider the following basic, and very broad, statistical problem: Given a known high-dimensional distribution

over

and a collection of data points in

, distinguish between the two possibilities that (i) the data was drawn from

, versus (ii) the data was drawn from

, i.e. from

subject to truncation by an unknown truncation set

. We study this problem in the setting where

is a high-dimensional i.i.d. product distribution and

is an unknown degree-

polynomial threshold function (one of the most well-studied types of Boolean-valued function over

). Our main results are an efficient algorithm when

is a hypercontractive distribution, and a matching lower bound:

For any constant

, we give a polynomial-time algorithm which successfully distinguishes

from

using

samples (subject to mild technical conditions on

and

);

Even for the simplest case of

being the uniform distribution over

, we show that for any constant

, any distinguishing algorithm for degree-

polynomial threshold functions must use

samples.

Paper Structure (20 sections, 14 theorems, 83 equations, 1 figure, 1 table, 1 algorithm)

This paper contains 20 sections, 14 theorems, 83 equations, 1 figure, 1 table, 1 algorithm.

Introduction
Our Results
Techniques
Overview of \ref{['thm:main-positive']}
Overview of \ref{['thm:main-negative']}
Related Work
Preliminaries
Harmonic Analysis over Product Spaces
Hypercontractive Distributions
An $O(n^{d/2})$-Sample Algorithm for Degree-$d$ PTFs
Useful Preliminaries
Case 1: $\theta \geq \frac{1}{2}$.
Case 2: $0 \leq \theta < \frac{1}{2}$.
Case 3: $\theta < 0$.
Proof of \ref{['thm:ptf-ub']}
...and 5 more sections

Key Result

Theorem 3

Let $0 < \varepsilon < 1$. Fix any constant $d$ and any hypercontractive i.i.d. product distribution $\mu^{\otimes n}$ over $\mathbb{R}^n$. Let $f: \mathbb{R}^n \to \{0,1\}$ be an unknown degree-$d$ PTF such that There is an efficient algorithm that uses $\Theta(n^{d/2}/\varepsilon^2)$ samples from ${\cal D}$ and successfully (w.h.p.) distinguishes between the following two cases:

Figures (1)

Figure 1: A draw of $\boldsymbol{f}$ from the distribution $\mathcal{F}_d$

Theorems & Definitions (34)

Theorem 3: Efficiently detecting PTF truncation, informal theorem statement
Theorem 4: Lower bound for detecting PTF truncation, informal theorem statement
Definition 5
Remark 6
Definition 7
Remark 8
Remark 9
Proposition 10: Anti-concentration of low-degree polynomials
proof
Proposition 11: Level-$k$ inequalities
...and 24 more

Detecting Low-Degree Truncation

Abstract

Detecting Low-Degree Truncation

Authors

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (34)