Testing Convex Truncation

Anindya De; Shivam Nadimpalli; Rocco A. Servedio

Testing Convex Truncation

Anindya De, Shivam Nadimpalli, Rocco A. Servedio

TL;DR

It is shown that the sample complexity of each of the algorithms is optimal up to a constant factor, and that any algorithm that can distinguish N(0,I_n) from $N( 0,I-n) conditioned on an unknown symmetric convex set must use $\Omega(n)$ samples.

Abstract

We study the basic statistical problem of testing whether normally distributed $n$-dimensional data has been truncated, i.e. altered by only retaining points that lie in some unknown truncation set $S \subseteq \mathbb{R}^n$. As our main algorithmic results, (1) We give a computationally efficient $O(n)$-sample algorithm that can distinguish the standard normal distribution $N(0,I_n)$ from $N(0,I_n)$ conditioned on an unknown and arbitrary convex set $S$. (2) We give a different computationally efficient $O(n)$-sample algorithm that can distinguish $N(0,I_n)$ from $N(0,I_n)$ conditioned on an unknown and arbitrary mixture of symmetric convex sets. These results stand in sharp contrast with known results for learning or testing convex bodies with respect to the normal distribution or learning convex-truncated normal distributions, where state-of-the-art algorithms require essentially $n^{\sqrt{n}}$ samples. An easy argument shows that no finite number of samples suffices to distinguish $N(0,I_n)$ from an unknown and arbitrary mixture of general (not necessarily symmetric) convex sets, so no common generalization of results (1) and (2) above is possible. We also prove that any algorithm (computationally efficient or otherwise) that can distinguish $N(0,I_n)$ from $N(0,I_n)$ conditioned on an unknown symmetric convex set must use $Ω(n)$ samples. This shows that the sample complexity of each of our algorithms is optimal up to a constant factor.

Testing Convex Truncation

TL;DR

It is shown that the sample complexity of each of the algorithms is optimal up to a constant factor, and that any algorithm that can distinguish N(0,I_n) from

\Omega(n)$ samples.

Abstract

We study the basic statistical problem of testing whether normally distributed

-dimensional data has been truncated, i.e. altered by only retaining points that lie in some unknown truncation set

. As our main algorithmic results, (1) We give a computationally efficient

-sample algorithm that can distinguish the standard normal distribution

from

conditioned on an unknown and arbitrary convex set

. (2) We give a different computationally efficient

-sample algorithm that can distinguish

from

conditioned on an unknown and arbitrary mixture of symmetric convex sets. These results stand in sharp contrast with known results for learning or testing convex bodies with respect to the normal distribution or learning convex-truncated normal distributions, where state-of-the-art algorithms require essentially

samples. An easy argument shows that no finite number of samples suffices to distinguish

from an unknown and arbitrary mixture of general (not necessarily symmetric) convex sets, so no common generalization of results (1) and (2) above is possible. We also prove that any algorithm (computationally efficient or otherwise) that can distinguish

from

conditioned on an unknown symmetric convex set must use

samples. This shows that the sample complexity of each of our algorithms is optimal up to a constant factor.

Paper Structure (28 sections, 21 theorems, 82 equations, 1 figure, 2 algorithms)

This paper contains 28 sections, 21 theorems, 82 equations, 1 figure, 2 algorithms.

Introduction
Our Results
Efficient Algorithms
An Information-Theoretic Lower Bound
Techniques
Related Work
Preliminaries
Basic Notation and Background
Notation.
Geometry.
Gaussians Distributions.
Gaussian Mean Testing.
Distinguishing Distributions.
Convex Influences
The Brascamp-Lieb Inequality
...and 13 more sections

Key Result

Theorem 1

There is an algorithm Symm-Convex-Distinguisher which uses $O(n/\varepsilon^2)$ samples, runs in $O(n^2/\varepsilon^2)$ time, and distinguishes between the standard $N(0,I_n)$ distribution and any distribution ${\cal D}=N(0,I_n)|_S$ where $S \subset \mathbb{R}^n$ is any symmetric convex set with Gau

Figures (1)

Figure 1: The "small inradius" ($r_\mathrm{in} \leq 0.1$) setting in the analysis of \ref{['alg:convex']}, with $\mu$ denoting the center of mass of $K$. Our estimator for (a) is $\mathrm{Avg}{\left( \|\boldsymbol{x}^{(j)}\|^2 \right)}$, whereas for (b) we simply estimate $\mu$.

Theorems & Definitions (36)

Theorem 1: Symmetric convex truncations, informal statement
Theorem 2: Mixtures of symmetric convex truncations, informal statement
Theorem 3: General convex truncations, informal statement
Theorem 4: Lower bound, informal statement
Proposition 5: Theorem 1.1 and Remark 1.2 of DKP-SOSA
Definition 7: Convex influence
Proposition 9: Poincaré for convex influences for symmetric convex sets
Proposition 10: Poincaré for convex influences for general convex sets
Proposition 11: Brascamp-Lieb inequality
Proposition 12: Lemma 4.7 of Vempala2010
...and 26 more

Testing Convex Truncation

TL;DR

Abstract

Testing Convex Truncation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (36)