Table of Contents
Fetching ...

Implicit High-Order Moment Tensor Estimation and Learning Latent Variable Models

Ilias Diakonikolas, Daniel M. Kane

TL;DR

The paper tackles learning high-dimensional latent-variable models via the method of moments and introduces a general implicit moment tensor estimation framework to efficiently compute high-order moment tensors. The authors develop a sequential tensor computation approach with recursive pseudo-projections, enabling poly$(d,k)$-time learning for mixtures of linear regressions, mixtures of spherical Gaussians (density and parameter estimation under optimal separation), and one-hidden-layer neural networks with positive weights. Key contributions include a formal algorithmic result for implicit tensor computation and multiple learning-theoretic applications that substantially improve prior runtimes and separation requirements. The framework leverages Hermite analysis and low-dimensional subspace projections to overcome the combinatorial blow-up of high-order moments, with significant implications for density estimation and structure learning in latent-variable models. This methodology broadens the toolkit for efficient high-dimensional latent-variable learning and suggests avenues for applying implicit tensor techniques to other probabilistic models and architectures.

Abstract

We study the task of learning latent-variable models. A common algorithmic technique for this task is the method of moments. Unfortunately, moment-based approaches are hampered by the fact that the moment tensors of super-constant degree cannot even be written down in polynomial time. Motivated by such learning applications, we develop a general efficient algorithm for {\em implicit moment tensor computation}. Our framework generalizes the work of~\cite{LL21-opt} which developed an efficient algorithm for the specific moment tensors that arise in clustering mixtures of spherical Gaussians. By leveraging our implicit moment estimation algorithm, we obtain the first $\mathrm{poly}(d, k)$-time learning algorithms for the following models. * {\bf Mixtures of Linear Regressions} We give a $\mathrm{poly}(d, k, 1/ε)$-time algorithm for this task, where $ε$ is the desired error. * {\bf Mixtures of Spherical Gaussians} For density estimation, we give a $\mathrm{poly}(d, k, 1/ε)$-time learning algorithm, where $ε$ is the desired total variation error, under the condition that the means lie in a ball of radius $O(\sqrt{\log k})$. For parameter estimation, we give a $\mathrm{poly}(d, k, 1/ε)$-time algorithm under the {\em optimal} mean separation of $Ω(\log^{1/2}(k/ε))$. * {\bf Positive Linear Combinations of Non-Linear Activations} We give a general algorithm for this task with complexity $\mathrm{poly}(d, k) g(ε)$, where $ε$ is the desired error and the function $g$ depends on the Hermite concentration of the target class of functions. Specifically, for positive linear combinations of ReLU activations, our algorithm has complexity $\mathrm{poly}(d, k) 2^{\mathrm{poly}(1/ε)}$.

Implicit High-Order Moment Tensor Estimation and Learning Latent Variable Models

TL;DR

The paper tackles learning high-dimensional latent-variable models via the method of moments and introduces a general implicit moment tensor estimation framework to efficiently compute high-order moment tensors. The authors develop a sequential tensor computation approach with recursive pseudo-projections, enabling poly-time learning for mixtures of linear regressions, mixtures of spherical Gaussians (density and parameter estimation under optimal separation), and one-hidden-layer neural networks with positive weights. Key contributions include a formal algorithmic result for implicit tensor computation and multiple learning-theoretic applications that substantially improve prior runtimes and separation requirements. The framework leverages Hermite analysis and low-dimensional subspace projections to overcome the combinatorial blow-up of high-order moments, with significant implications for density estimation and structure learning in latent-variable models. This methodology broadens the toolkit for efficient high-dimensional latent-variable learning and suggests avenues for applying implicit tensor techniques to other probabilistic models and architectures.

Abstract

We study the task of learning latent-variable models. A common algorithmic technique for this task is the method of moments. Unfortunately, moment-based approaches are hampered by the fact that the moment tensors of super-constant degree cannot even be written down in polynomial time. Motivated by such learning applications, we develop a general efficient algorithm for {\em implicit moment tensor computation}. Our framework generalizes the work of~\cite{LL21-opt} which developed an efficient algorithm for the specific moment tensors that arise in clustering mixtures of spherical Gaussians. By leveraging our implicit moment estimation algorithm, we obtain the first -time learning algorithms for the following models. * {\bf Mixtures of Linear Regressions} We give a -time algorithm for this task, where is the desired error. * {\bf Mixtures of Spherical Gaussians} For density estimation, we give a -time learning algorithm, where is the desired total variation error, under the condition that the means lie in a ball of radius . For parameter estimation, we give a -time algorithm under the {\em optimal} mean separation of . * {\bf Positive Linear Combinations of Non-Linear Activations} We give a general algorithm for this task with complexity , where is the desired error and the function depends on the Hermite concentration of the target class of functions. Specifically, for positive linear combinations of ReLU activations, our algorithm has complexity .

Paper Structure

This paper contains 47 sections, 22 theorems, 85 equations, 1 figure.

Key Result

Theorem 1.1

Let $F$ be a $k$-MLR distribution on $\mathbb{R}^{d+1}$ with $B,\sigma \leq 1$, where $B = \max_i \|\beta_i\|_2$ and $\sigma>0$ is the standard deviation of the Gaussian noise. There exists an algorithm that draws $N=\mathrm{poly}(k,d)(1/\epsilon)^{O(\sigma^{-2})}$ samples from $F$, runs in $\mathrm

Figures (1)

  • Figure 1: The upper circuit represents the original Sequential Tensor Computation (STC). The markings next to the circuit elements represent the complexity of the elements being dealt with. The original inputs are vectors (lines). Then many of the middle components deal with $2$-tensors (squares), and finally the circuit elements on the right deal with $3$-tensors (cubes). In an STC of higher degree, one might need to use even higher degree tensors which quickly become computationally unmanageable. Fortunately, we are able to simulate this computation using the circuit below, keeping track of the projected values of each circuit element. This keeps the computational complexity low by reducing everything to vectors of dimension at most $\dim(\Phi)$ (represented by short lines).

Theorems & Definitions (56)

  • Theorem 1.1: Density Estimation for $k$-MLR
  • Theorem 1.2: Density Estimation for Spherical $k$-GMMs with Bounded Means
  • Theorem 1.3: Clustering Mixtures of Spherical Gaussians under Optimal Separation
  • Theorem 1.4: PAC Learning $\mathcal{C}_{\sigma, d, k}$
  • Corollary 1.5
  • Corollary 1.6
  • Remark 1.7
  • Proposition 1.8: Implicit Moment Tensor Computation, Informal
  • Definition 2.1: Normalized Probabilist's Hermite Polynomial
  • Definition 2.2: Normalized Hermite Tensor
  • ...and 46 more