Table of Contents
Fetching ...

Dimension Reduction via Sum-of-Squares and Improved Clustering Algorithms for Non-Spherical Mixtures

Prashanti Anderson, Mitali Bafna, Rares-Darius Buhai, Pravesh K. Kothari, David Steurer

TL;DR

This work develops a new approach for clustering non-spherical Gaussian mixture models via a subroutine, based on the sum-of-squares method, that finds a low-dimensional separation-preserving projection of the input data.

Abstract

We develop a new approach for clustering non-spherical (i.e., arbitrary component covariances) Gaussian mixture models via a subroutine, based on the sum-of-squares method, that finds a low-dimensional separation-preserving projection of the input data. Our method gives a non-spherical analog of the classical dimension reduction, based on singular value decomposition, that forms a key component of the celebrated spherical clustering algorithm of Vempala and Wang [VW04] (in addition to several other applications). As applications, we obtain an algorithm to (1) cluster an arbitrary total-variation separated mixture of $k$ centered (i.e., zero-mean) Gaussians with $n\geq \operatorname{poly}(d) f(w_{\min}^{-1})$ samples and $\operatorname{poly}(n)$ time, and (2) cluster an arbitrary total-variation separated mixture of $k$ Gaussians with identical but arbitrary unknown covariance with $n \geq d^{O(\log w_{\min}^{-1})} f(w_{\min}^{-1})$ samples and $n^{O(\log w_{\min}^{-1})}$ time. Here, $w_{\min}$ is the minimum mixing weight of the input mixture, and $f$ does not depend on the dimension $d$. Our algorithms naturally extend to tolerating a dimension-independent fraction of arbitrary outliers. Before this work, the techniques in the state-of-the-art non-spherical clustering algorithms needed $d^{O(k)} f(w_{\min}^{-1})$ time and samples for clustering such mixtures. Our results may come as a surprise in the context of the $d^{Ω(k)}$ statistical query lower bound [DKS17] for clustering non-spherical Gaussian mixtures. While this result is usually thought to rule out $d^{o(k)}$ cost algorithms for the problem, our results show that the lower bounds can in fact be circumvented for a remarkably general class of Gaussian mixtures.

Dimension Reduction via Sum-of-Squares and Improved Clustering Algorithms for Non-Spherical Mixtures

TL;DR

This work develops a new approach for clustering non-spherical Gaussian mixture models via a subroutine, based on the sum-of-squares method, that finds a low-dimensional separation-preserving projection of the input data.

Abstract

We develop a new approach for clustering non-spherical (i.e., arbitrary component covariances) Gaussian mixture models via a subroutine, based on the sum-of-squares method, that finds a low-dimensional separation-preserving projection of the input data. Our method gives a non-spherical analog of the classical dimension reduction, based on singular value decomposition, that forms a key component of the celebrated spherical clustering algorithm of Vempala and Wang [VW04] (in addition to several other applications). As applications, we obtain an algorithm to (1) cluster an arbitrary total-variation separated mixture of centered (i.e., zero-mean) Gaussians with samples and time, and (2) cluster an arbitrary total-variation separated mixture of Gaussians with identical but arbitrary unknown covariance with samples and time. Here, is the minimum mixing weight of the input mixture, and does not depend on the dimension . Our algorithms naturally extend to tolerating a dimension-independent fraction of arbitrary outliers. Before this work, the techniques in the state-of-the-art non-spherical clustering algorithms needed time and samples for clustering such mixtures. Our results may come as a surprise in the context of the statistical query lower bound [DKS17] for clustering non-spherical Gaussian mixtures. While this result is usually thought to rule out cost algorithms for the problem, our results show that the lower bounds can in fact be circumvented for a remarkably general class of Gaussian mixtures.

Paper Structure

This paper contains 46 sections, 41 theorems, 162 equations, 6 algorithms.

Key Result

Theorem 1.1

There exists an algorithm that takes input an independent sample from a mixture of $k$ centered Gaussians with minimum weight $w_{\min}$ and at most an $\varepsilon \ll w_{\min}$ fraction of adversarially corrupted samples, and outputs a clustering such that in each cluster at most a $O(kw_{\min}^{-

Theorems & Definitions (98)

  • Theorem 1.1: Main Theorem 1, See \ref{['thm:zero-mean-main']} for Full Version
  • Theorem 1.2: Main Theorem 2, See \ref{['thm:same-cov-clustering']} for Full Version
  • Definition 2.1: Parameter Distance
  • Proposition 2.2: No Moment Matching with Centered Mixtures
  • proof
  • Proposition 2.3
  • proof
  • Definition 3.1: Sum-of-Squares Proofs
  • Definition 3.2: Pseudo-Distributions
  • Definition 3.3: Pseudo-Expectations
  • ...and 88 more