Mean-Field Analysis for Learning Subspace-Sparse Polynomials with Gaussian Input

Ziang Chen; Rong Ge

Mean-Field Analysis for Learning Subspace-Sparse Polynomials with Gaussian Input

Ziang Chen, Rong Ge

TL;DR

This work analyzes mean-field SGD dynamics for learning subspace-sparse polynomials with Gaussian inputs, where the target depends only on a low-dimensional subspace projection. It delivers a basis-free necessary condition via a reflective property and a near-sufficient condition guaranteeing dimension-free exponential convergence under a strengthened assumption and a specialized training procedure that uses averaging and a tailored activation. The results connect to isotropic leap concepts and establish dimension-free convergence rates, while also detailing algebraic independence arguments to ensure kernel non-degeneracy. The findings provide theoretical insight into feature learning under mean-field dynamics and point to future work on bridging the gap between the necessary and sufficient conditions and extending the analysis to more general SGD variants.

Abstract

In this work, we study the mean-field flow for learning subspace-sparse polynomials using stochastic gradient descent and two-layer neural networks, where the input distribution is standard Gaussian and the output only depends on the projection of the input onto a low-dimensional subspace. We establish a necessary condition for SGD-learnability, involving both the characteristics of the target function and the expressiveness of the activation function. In addition, we prove that the condition is almost sufficient, in the sense that a condition slightly stronger than the necessary condition can guarantee the exponential decay of the loss functional to zero.

Mean-Field Analysis for Learning Subspace-Sparse Polynomials with Gaussian Input

TL;DR

Abstract

Paper Structure (31 sections, 26 theorems, 155 equations, 1 algorithm)

This paper contains 31 sections, 26 theorems, 155 equations, 1 algorithm.

Introduction
Our contribution and related works
Technical challenges
Organization
Preliminaries on Mean-Field Dynamics
Two-layer neural network and SGD
Mean-field dynamics
Necessary Condition for SGD-Learnability
Reflective Property
Proof Sketch for Theorem \ref{['thm:main_necessary']}
Sufficient Condition for SGD-Learnability
Training Procedure and Convergence Guarantee
Proof Sketch for Theorem \ref{['thm:sufficient']}
Algebraic independence of $\hat{u}_i$
Non-degeneracy of $\hat{M}(\mathbf{a}, t)$
...and 16 more sections

Key Result

Theorem 3.4

Suppose that Assumption asp:necessary holds with $\rho_w\sim\mathcal{N}(0,\frac{1}{d}I_d)$, and that $h^*:V\to\mathbb{R}$ satisfies the reflective property with respect to some subspace $S\subset V$ and activation function $\sigma$. Then for any $T>0$, there exists a constant $C>0$ depending only on where $h_{S^\perp}^*(z_S^\perp) = \mathbb{E}_{z_S}[h^*(z)]$. In particular, if $h^*(z)$ is not inde

Theorems & Definitions (54)

Remark 3.2
Definition 3.3: Reflective property
Theorem 3.4
Theorem 3.5
Theorem 3.6
Theorem 4.3
Proposition 4.4
Theorem 4.5
Theorem 4.6: Jacobian criterion beecken2013algebraic
Lemma 4.7
...and 44 more

Mean-Field Analysis for Learning Subspace-Sparse Polynomials with Gaussian Input

TL;DR

Abstract

Mean-Field Analysis for Learning Subspace-Sparse Polynomials with Gaussian Input

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (54)