Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics

Alireza Mousavi-Hosseini; Denny Wu; Murat A. Erdogdu

Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics

Alireza Mousavi-Hosseini, Denny Wu, Murat A. Erdogdu

TL;DR

The paper analyzes learning multi-index models with a two-layer neural network trained by mean-field Langevin dynamics (MFLA). It proves that, under mild sub-Gaussian data assumptions, the effective dimension $d_{ ext{eff}}$ governs both sample complexity (nearly linear in $d_{ ext{eff}}$) and computational cost (potential exponential scaling in $d_{ ext{eff}}$). To achieve practical, polynomial-time learning, it develops two pathways: (i) in the Euclidean setting, leveraging $d_{ ext{eff}}$ to obtain dimension-adaptive guarantees; (ii) in the Riemannian setting, constraining weights to a manifold with positive Ricci curvature (e.g., the hypersphere) to obtain a uniform log-Sobolev constant that yields polynomial-time convergence; under these conditions Euclidean results may still incur exponential dependence on $d_{ ext{eff}}$ in the worst case. The work also analyzes how covariance structure (spiked models and power-law spectra) affects $d_{ ext{eff}}$ and thus sample and computation costs, and outlines steps toward lower bounds and broader future directions. Overall, the paper highlights how mean-field training exhibits adaptivity to latent low-dimensional structure, offering statistically efficient learning while delineating computational limits and paths to tractable convergence in structured parameter spaces.

Abstract

We study the problem of learning multi-index models in high-dimensions using a two-layer neural network trained with the mean-field Langevin algorithm. Under mild distributional assumptions on the data, we characterize the effective dimension $d_{\mathrm{eff}}$ that controls both sample and computational complexity by utilizing the adaptivity of neural networks to latent low-dimensional structures. When the data exhibit such a structure, $d_{\mathrm{eff}}$ can be significantly smaller than the ambient dimension. We prove that the sample complexity grows almost linearly with $d_{\mathrm{eff}}$, bypassing the limitations of the information and generative exponents that appeared in recent analyses of gradient-based feature learning. On the other hand, the computational complexity may inevitably grow exponentially with $d_{\mathrm{eff}}$ in the worst-case scenario. Motivated by improving computational complexity, we take the first steps towards polynomial time convergence of the mean-field Langevin algorithm by investigating a setting where the weights are constrained to be on a compact manifold with positive Ricci curvature, such as the hypersphere. There, we study assumptions under which polynomial time convergence is achievable, whereas similar assumptions in the Euclidean setting lead to exponential time complexity.

Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics

TL;DR

governs both sample complexity (nearly linear in

) and computational cost (potential exponential scaling in

). To achieve practical, polynomial-time learning, it develops two pathways: (i) in the Euclidean setting, leveraging

to obtain dimension-adaptive guarantees; (ii) in the Riemannian setting, constraining weights to a manifold with positive Ricci curvature (e.g., the hypersphere) to obtain a uniform log-Sobolev constant that yields polynomial-time convergence; under these conditions Euclidean results may still incur exponential dependence on

in the worst case. The work also analyzes how covariance structure (spiked models and power-law spectra) affects

and thus sample and computation costs, and outlines steps toward lower bounds and broader future directions. Overall, the paper highlights how mean-field training exhibits adaptivity to latent low-dimensional structure, offering statistically efficient learning while delineating computational limits and paths to tractable convergence in structured parameter spaces.

Abstract

that controls both sample and computational complexity by utilizing the adaptivity of neural networks to latent low-dimensional structures. When the data exhibit such a structure,

can be significantly smaller than the ambient dimension. We prove that the sample complexity grows almost linearly with

, bypassing the limitations of the information and generative exponents that appeared in recent analyses of gradient-based feature learning. On the other hand, the computational complexity may inevitably grow exponentially with

in the worst-case scenario. Motivated by improving computational complexity, we take the first steps towards polynomial time convergence of the mean-field Langevin algorithm by investigating a setting where the weights are constrained to be on a compact manifold with positive Ricci curvature, such as the hypersphere. There, we study assumptions under which polynomial time convergence is achievable, whereas similar assumptions in the Euclidean setting lead to exponential time complexity.

Paper Structure (28 sections, 29 theorems, 154 equations, 2 figures, 1 table)

This paper contains 28 sections, 29 theorems, 154 equations, 2 figures, 1 table.

Introduction
Our Contributions
Related Works
Mean-field Langevin dynamics.
Learning low-dimensional targets.
Notation.
Preliminaries: Optimization in Measure Space
Statistical model.
Optimization in measure space.
Learning Multi-index Models in the Euclidean Setting
Statistical and Computational Complexity of MFLA
Utilizing the Effective Dimension
Spiked covariance.
Scaling laws under power-law spectra.
Polynomial Time Convergence in the Riemannian Setting
...and 13 more sections

Key Result

Proposition 2

Suppose $\rho$ is $C_\rho$-Lipschitz. Then for any $\mu \in \mathcal{P}_2(\mathbb{R}^{2d+2})$, the probability measure $\nu_{\mu} \propto \exp(-\beta \hat{\mathcal{J}}_\lambda'[\mu])$ with $\hat{\mathcal{J}}_\lambda'$ given by eq:first_var_euclidean satisfies the LSI eq:LSI with constant

Figures (2)

Figure 1: (a) $d_{\mathrm{eff}}$ according to Corollary \ref{['cor:power-law']}. (b) Test loss from MFLA, details in Appendix \ref{['app:experiment']}.
Figure 2: Generalization gap measured by variying the effective dimension.

Theorems & Definitions (32)

Definition 1: Effective dimension
Proposition 2
Theorem 3
Corollary 4
Corollary 5
Corollary 6
Example 7
Proposition 8
Proposition 9
Theorem 10
...and 22 more

Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics

TL;DR

Abstract

Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (32)