Table of Contents
Fetching ...

Computation-information gap in high-dimensional clustering

Bertrand Even, Christophe Giraud, Nicolas Verzelen

TL;DR

In order to prove a non-asymptotic low-degree polynomials computational barrier for clustering in high-dimension, sophisticated combinatorial arguments are developed to upper-bound the mixed moments of the signal under a Bernoulli Bayesian model.

Abstract

We investigate the existence of a fundamental computation-information gap for the problem of clustering a mixture of isotropic Gaussian in the high-dimensional regime, where the ambient dimension $p$ is larger than the number $n$ of points. The existence of a computation-information gap in a specific Bayesian high-dimensional asymptotic regime has been conjectured by arXiv:1610.02918 based on the replica heuristic from statistical physics. We provide evidence of the existence of such a gap generically in the high-dimensional regime $p \geq n$, by (i) proving a non-asymptotic low-degree polynomials computational barrier for clustering in high-dimension, matching the performance of the best known polynomial time algorithms, and by (ii) establishing that the information barrier for clustering is smaller than the computational barrier, when the number $K$ of clusters is large enough. These results are in contrast with the (moderately) low-dimensional regime $n \geq poly(p, K)$, where there is no computation-information gap for clustering a mixture of isotropic Gaussian. In order to prove our low-degree computational barrier, we develop sophisticated combinatorial arguments to upper-bound the mixed moments of the signal under a Bernoulli Bayesian model.

Computation-information gap in high-dimensional clustering

TL;DR

In order to prove a non-asymptotic low-degree polynomials computational barrier for clustering in high-dimension, sophisticated combinatorial arguments are developed to upper-bound the mixed moments of the signal under a Bernoulli Bayesian model.

Abstract

We investigate the existence of a fundamental computation-information gap for the problem of clustering a mixture of isotropic Gaussian in the high-dimensional regime, where the ambient dimension is larger than the number of points. The existence of a computation-information gap in a specific Bayesian high-dimensional asymptotic regime has been conjectured by arXiv:1610.02918 based on the replica heuristic from statistical physics. We provide evidence of the existence of such a gap generically in the high-dimensional regime , by (i) proving a non-asymptotic low-degree polynomials computational barrier for clustering in high-dimension, matching the performance of the best known polynomial time algorithms, and by (ii) establishing that the information barrier for clustering is smaller than the computational barrier, when the number of clusters is large enough. These results are in contrast with the (moderately) low-dimensional regime , where there is no computation-information gap for clustering a mixture of isotropic Gaussian. In order to prove our low-degree computational barrier, we develop sophisticated combinatorial arguments to upper-bound the mixed moments of the signal under a Bernoulli Bayesian model.
Paper Structure (51 sections, 39 theorems, 220 equations, 2 figures, 1 algorithm)

This paper contains 51 sections, 39 theorems, 220 equations, 2 figures, 1 algorithm.

Key Result

Theorem 1

Let $D\in \mathbb{N}$. If $p\geq n$ and $\zeta_{n}:=\frac{\bar{\Delta}^{4}D^{8}(1+D)^{4}}{p}\max\left(\frac{n}{K^{2}},1\right)<1$, then under the prior of Definition def:prior, we have In particular, if $\bar{\Delta}^2 \ll D^{-6}\left(\sqrt{pK^2\over n}\wedge\sqrt{p}\right)$, then $MMSE_{\leq D}=\frac{1}{K}-\frac{1+o(1)}{K^{2}}$.

Figures (2)

  • Figure 1: The graph $\mathcal{G}^{-}_{\gamma}$ (on the left), and the corresponding graph $\mathcal{V}_{\gamma}$ (on the right).
  • Figure 2: The graph $\mathcal{G}_{\gamma'}$ (on the left), and the corresponding graph $\mathcal{V}_{\gamma'}$ (on the right).

Theorems & Definitions (45)

  • Definition 1
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Corollary 1
  • Proposition 1
  • Lemma 1
  • Lemma 2
  • proof : Proof of Lemma \ref{['lem:condition_topology']}
  • ...and 35 more