Local geometry of high-dimensional mixture models: Effective spectral theory and dynamical transitions

Gerard Ben Arous; Reza Gheissari; Jiaoyang Huang; Aukosh Jagannath

Local geometry of high-dimensional mixture models: Effective spectral theory and dynamical transitions

Gerard Ben Arous, Reza Gheissari, Jiaoyang Huang, Aukosh Jagannath

TL;DR

The paper develops a precise, dimension-free spectral theory for self-coupled empirical matrices arising in high-dimensional loss landscapes, showing that the bulk spectrum and outliers depend only on a low-dimensional Gram summary $\mathbf{G}=(\mathbf{x},\boldsymbol{\mu})^T(\mathbf{x},\boldsymbol{\mu})$. It then connects SGD dynamics to an autonomous evolution of these summaries, enabling tracking of spectral transitions along training trajectories, with explicit results for high-dimensional logistic regression on Gaussian mixtures. The framework encompasses a broad class of problems, including multi-layer GMM classification and multi-index regression, and yields both static initialization results and dynamical outlier-splitting along SGD, providing sharp BBP-type thresholds and deterministic equivalents. The work offers a powerful tool for understanding when informative spectral directions emerge during learning, and how the interplay between bulk and outliers shapes optimization in high dimensions, with potential implications for understanding generalization and training dynamics in structured models.

Abstract

We study the local geometry of empirical risks in high dimensions via the spectral theory of their Hessian and information matrices. We focus on settings where the data, $(Y_\ell)_{\ell =1}^n \in \mathbb{R}^d$, are i.i.d. draws of a $k$-Gaussian mixture model, and the loss depends on the projection of the data into a fixed number of vectors, namely $\mathbf{x}^\top Y$, where $\mathbf{x}\in \mathbb{R}^{d\times C}$ are the parameters, and $C$ need not equal $k$. This setting captures a broad class of problems such as classification by one and two-layer networks and regression on multi-index models. We provide exact formulas for the limits of the empirical spectral distribution and outlier eigenvalues and eigenvectors of such matrices in the proportional asymptotics limit, where the number of samples and dimension $n,d\to\infty$ and $n/d=φ\in (0,\infty)$. These limits depend on the parameters $\mathbf{x}$ only through the summary statistic of the $(C+k)\times (C+k)$ Gram matrix of the parameters and class means, $\mathbf{G} = (\mathbf{x},\boldsymbolμ)^\top(\mathbf{x},\boldsymbolμ)$. It is known that under general conditions, when $\mathbf{x}$ is trained by online stochastic gradient descent, the evolution of these same summary statistics along training converges to the solution of an autonomous system of ODEs, called the effective dynamics. This enables us to connect the training dynamics to the spectral theory of these matrices generated with test data. We demonstrate our general results by analyzing the effective spectrum along the effective dynamics in the case of multi-class logistic regression. In this setting, the empirical Hessian and information matrices have substantially different spectra, each with their own static and even dynamical spectral transitions.

Local geometry of high-dimensional mixture models: Effective spectral theory and dynamical transitions

TL;DR

. It then connects SGD dynamics to an autonomous evolution of these summaries, enabling tracking of spectral transitions along training trajectories, with explicit results for high-dimensional logistic regression on Gaussian mixtures. The framework encompasses a broad class of problems, including multi-layer GMM classification and multi-index regression, and yields both static initialization results and dynamical outlier-splitting along SGD, providing sharp BBP-type thresholds and deterministic equivalents. The work offers a powerful tool for understanding when informative spectral directions emerge during learning, and how the interplay between bulk and outliers shapes optimization in high dimensions, with potential implications for understanding generalization and training dynamics in structured models.

Abstract

We study the local geometry of empirical risks in high dimensions via the spectral theory of their Hessian and information matrices. We focus on settings where the data,

, are i.i.d. draws of a

-Gaussian mixture model, and the loss depends on the projection of the data into a fixed number of vectors, namely

, where

are the parameters, and

need not equal

. This setting captures a broad class of problems such as classification by one and two-layer networks and regression on multi-index models. We provide exact formulas for the limits of the empirical spectral distribution and outlier eigenvalues and eigenvectors of such matrices in the proportional asymptotics limit, where the number of samples and dimension

and

. These limits depend on the parameters

only through the summary statistic of the

Gram matrix of the parameters and class means,

. It is known that under general conditions, when

is trained by online stochastic gradient descent, the evolution of these same summary statistics along training converges to the solution of an autonomous system of ODEs, called the effective dynamics. This enables us to connect the training dynamics to the spectral theory of these matrices generated with test data. We demonstrate our general results by analyzing the effective spectrum along the effective dynamics in the case of multi-class logistic regression. In this setting, the empirical Hessian and information matrices have substantially different spectra, each with their own static and even dynamical spectral transitions.

Local geometry of high-dimensional mixture models: Effective spectral theory and dynamical transitions

TL;DR

Abstract

Local geometry of high-dimensional mixture models: Effective spectral theory and dynamical transitions

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (79)