Implicit High-Order Moment Tensor Estimation and Learning Latent Variable Models
Ilias Diakonikolas, Daniel M. Kane
TL;DR
The paper tackles learning high-dimensional latent-variable models via the method of moments and introduces a general implicit moment tensor estimation framework to efficiently compute high-order moment tensors. The authors develop a sequential tensor computation approach with recursive pseudo-projections, enabling poly$(d,k)$-time learning for mixtures of linear regressions, mixtures of spherical Gaussians (density and parameter estimation under optimal separation), and one-hidden-layer neural networks with positive weights. Key contributions include a formal algorithmic result for implicit tensor computation and multiple learning-theoretic applications that substantially improve prior runtimes and separation requirements. The framework leverages Hermite analysis and low-dimensional subspace projections to overcome the combinatorial blow-up of high-order moments, with significant implications for density estimation and structure learning in latent-variable models. This methodology broadens the toolkit for efficient high-dimensional latent-variable learning and suggests avenues for applying implicit tensor techniques to other probabilistic models and architectures.
Abstract
We study the task of learning latent-variable models. A common algorithmic technique for this task is the method of moments. Unfortunately, moment-based approaches are hampered by the fact that the moment tensors of super-constant degree cannot even be written down in polynomial time. Motivated by such learning applications, we develop a general efficient algorithm for {\em implicit moment tensor computation}. Our framework generalizes the work of~\cite{LL21-opt} which developed an efficient algorithm for the specific moment tensors that arise in clustering mixtures of spherical Gaussians. By leveraging our implicit moment estimation algorithm, we obtain the first $\mathrm{poly}(d, k)$-time learning algorithms for the following models. * {\bf Mixtures of Linear Regressions} We give a $\mathrm{poly}(d, k, 1/ε)$-time algorithm for this task, where $ε$ is the desired error. * {\bf Mixtures of Spherical Gaussians} For density estimation, we give a $\mathrm{poly}(d, k, 1/ε)$-time learning algorithm, where $ε$ is the desired total variation error, under the condition that the means lie in a ball of radius $O(\sqrt{\log k})$. For parameter estimation, we give a $\mathrm{poly}(d, k, 1/ε)$-time algorithm under the {\em optimal} mean separation of $Ω(\log^{1/2}(k/ε))$. * {\bf Positive Linear Combinations of Non-Linear Activations} We give a general algorithm for this task with complexity $\mathrm{poly}(d, k) g(ε)$, where $ε$ is the desired error and the function $g$ depends on the Hermite concentration of the target class of functions. Specifically, for positive linear combinations of ReLU activations, our algorithm has complexity $\mathrm{poly}(d, k) 2^{\mathrm{poly}(1/ε)}$.
