Implicit High-Order Moment Tensor Estimation and Learning Latent Variable Models

Ilias Diakonikolas; Daniel M. Kane

Implicit High-Order Moment Tensor Estimation and Learning Latent Variable Models

Ilias Diakonikolas, Daniel M. Kane

TL;DR

The paper tackles learning high-dimensional latent-variable models via the method of moments and introduces a general implicit moment tensor estimation framework to efficiently compute high-order moment tensors. The authors develop a sequential tensor computation approach with recursive pseudo-projections, enabling poly$(d,k)$-time learning for mixtures of linear regressions, mixtures of spherical Gaussians (density and parameter estimation under optimal separation), and one-hidden-layer neural networks with positive weights. Key contributions include a formal algorithmic result for implicit tensor computation and multiple learning-theoretic applications that substantially improve prior runtimes and separation requirements. The framework leverages Hermite analysis and low-dimensional subspace projections to overcome the combinatorial blow-up of high-order moments, with significant implications for density estimation and structure learning in latent-variable models. This methodology broadens the toolkit for efficient high-dimensional latent-variable learning and suggests avenues for applying implicit tensor techniques to other probabilistic models and architectures.

Abstract

We study the task of learning latent-variable models. A common algorithmic technique for this task is the method of moments. Unfortunately, moment-based approaches are hampered by the fact that the moment tensors of super-constant degree cannot even be written down in polynomial time. Motivated by such learning applications, we develop a general efficient algorithm for {\em implicit moment tensor computation}. Our framework generalizes the work of~\cite{LL21-opt} which developed an efficient algorithm for the specific moment tensors that arise in clustering mixtures of spherical Gaussians. By leveraging our implicit moment estimation algorithm, we obtain the first $\mathrm{poly}(d, k)$-time learning algorithms for the following models. * {\bf Mixtures of Linear Regressions} We give a $\mathrm{poly}(d, k, 1/ε)$-time algorithm for this task, where $ε$ is the desired error. * {\bf Mixtures of Spherical Gaussians} For density estimation, we give a $\mathrm{poly}(d, k, 1/ε)$-time learning algorithm, where $ε$ is the desired total variation error, under the condition that the means lie in a ball of radius $O(\sqrt{\log k})$. For parameter estimation, we give a $\mathrm{poly}(d, k, 1/ε)$-time algorithm under the {\em optimal} mean separation of $Ω(\log^{1/2}(k/ε))$. * {\bf Positive Linear Combinations of Non-Linear Activations} We give a general algorithm for this task with complexity $\mathrm{poly}(d, k) g(ε)$, where $ε$ is the desired error and the function $g$ depends on the Hermite concentration of the target class of functions. Specifically, for positive linear combinations of ReLU activations, our algorithm has complexity $\mathrm{poly}(d, k) 2^{\mathrm{poly}(1/ε)}$.

Implicit High-Order Moment Tensor Estimation and Learning Latent Variable Models

TL;DR

-time learning for mixtures of linear regressions, mixtures of spherical Gaussians (density and parameter estimation under optimal separation), and one-hidden-layer neural networks with positive weights. Key contributions include a formal algorithmic result for implicit tensor computation and multiple learning-theoretic applications that substantially improve prior runtimes and separation requirements. The framework leverages Hermite analysis and low-dimensional subspace projections to overcome the combinatorial blow-up of high-order moments, with significant implications for density estimation and structure learning in latent-variable models. This methodology broadens the toolkit for efficient high-dimensional latent-variable learning and suggests avenues for applying implicit tensor techniques to other probabilistic models and architectures.

Abstract

-time learning algorithms for the following models. * {\bf Mixtures of Linear Regressions} We give a

-time algorithm for this task, where

is the desired error. * {\bf Mixtures of Spherical Gaussians} For density estimation, we give a

-time learning algorithm, where

is the desired total variation error, under the condition that the means lie in a ball of radius

. For parameter estimation, we give a

-time algorithm under the {\em optimal} mean separation of

. * {\bf Positive Linear Combinations of Non-Linear Activations} We give a general algorithm for this task with complexity

, where

is the desired error and the function

depends on the Hermite concentration of the target class of functions. Specifically, for positive linear combinations of ReLU activations, our algorithm has complexity

Implicit High-Order Moment Tensor Estimation and Learning Latent Variable Models

TL;DR

Abstract

Implicit High-Order Moment Tensor Estimation and Learning Latent Variable Models

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (56)