Sum-of-squares lower bounds for Non-Gaussian Component Analysis

Ilias Diakonikolas; Sushrut Karmalkar; Shuo Pang; Aaron Potechin

Sum-of-squares lower bounds for Non-Gaussian Component Analysis

Ilias Diakonikolas, Sushrut Karmalkar, Shuo Pang, Aaron Potechin

TL;DR

The main contribution is the first super-constant degree SoS lower bound for NGCA, which significantly strengthens prior work by establishing a super-polynomial information-computation tradeoff against a broader family of algorithms.

Abstract

Non-Gaussian Component Analysis (NGCA) is the statistical task of finding a non-Gaussian direction in a high-dimensional dataset. Specifically, given i.i.d.\ samples from a distribution $P^A_{v}$ on $\mathbb{R}^n$ that behaves like a known distribution $A$ in a hidden direction $v$ and like a standard Gaussian in the orthogonal complement, the goal is to approximate the hidden direction. The standard formulation posits that the first $k-1$ moments of $A$ match those of the standard Gaussian and the $k$-th moment differs. Under mild assumptions, this problem has sample complexity $O(n)$. On the other hand, all known efficient algorithms require $Ω(n^{k/2})$ samples. Prior work developed sharp Statistical Query and low-degree testing lower bounds suggesting an information-computation tradeoff for this problem. Here we study the complexity of NGCA in the Sum-of-Squares (SoS) framework. Our main contribution is the first super-constant degree SoS lower bound for NGCA. Specifically, we show that if the non-Gaussian distribution $A$ matches the first $(k-1)$ moments of $\mathcal{N}(0, 1)$ and satisfies other mild conditions, then with fewer than $n^{(1 - \varepsilon)k/2}$ many samples from the normal distribution, with high probability, degree $(\log n)^{{1\over 2}-o_n(1)}$ SoS fails to refute the existence of such a direction $v$. Our result significantly strengthens prior work by establishing a super-polynomial information-computation tradeoff against a broader family of algorithms. As corollaries, we obtain SoS lower bounds for several problems in robust statistics and the learning of mixture models. Our SoS lower bound proof introduces a novel technique, that we believe may be of broader interest, and a number of refinements over existing methods.

Sum-of-squares lower bounds for Non-Gaussian Component Analysis

TL;DR

Abstract

Non-Gaussian Component Analysis (NGCA) is the statistical task of finding a non-Gaussian direction in a high-dimensional dataset. Specifically, given i.i.d.\ samples from a distribution

that behaves like a known distribution

in a hidden direction

and like a standard Gaussian in the orthogonal complement, the goal is to approximate the hidden direction. The standard formulation posits that the first

moments of

match those of the standard Gaussian and the

-th moment differs. Under mild assumptions, this problem has sample complexity

. On the other hand, all known efficient algorithms require

samples. Prior work developed sharp Statistical Query and low-degree testing lower bounds suggesting an information-computation tradeoff for this problem. Here we study the complexity of NGCA in the Sum-of-Squares (SoS) framework. Our main contribution is the first super-constant degree SoS lower bound for NGCA. Specifically, we show that if the non-Gaussian distribution

matches the first

moments of

and satisfies other mild conditions, then with fewer than

many samples from the normal distribution, with high probability, degree

SoS fails to refute the existence of such a direction

. Our result significantly strengthens prior work by establishing a super-polynomial information-computation tradeoff against a broader family of algorithms. As corollaries, we obtain SoS lower bounds for several problems in robust statistics and the learning of mixture models. Our SoS lower bound proof introduces a novel technique, that we believe may be of broader interest, and a number of refinements over existing methods.

Sum-of-squares lower bounds for Non-Gaussian Component Analysis

TL;DR

Abstract

Sum-of-squares lower bounds for Non-Gaussian Component Analysis

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (296)