Table of Contents
Fetching ...

Gaussian Universality in Neural Network Dynamics with Generalized Structured Input Distributions

Jaeyong Bae, Hawoong Jeong

TL;DR

This work broadens the hidden manifold framework by incorporating general Gaussian mixtures into neural network inputs and shows that, after standardization and under weak correlations, SGD dynamics converge to the Gaussian benchmark predicted by the Gaussian Equivalence Property. The authors establish rigorous results on asymptotic Gaussianity and covariance consistency for block-dependent mixtures and provide thorough numerical evidence, including a Berry–Esseen-based bound and data collapse across dimensions. They further explore the limits of universality under increased input correlation and real-world data, identifying key drivers of dynamic deviations and offering practical guidance for when Gaussian-based analyses remain informative. Overall, the study strengthens the theoretical foundation of deep learning dynamics by revealing a robust form of universality driven by low-order moments rather than exact distributional form.

Abstract

Analyzing neural network dynamics via stochastic gradient descent (SGD) is crucial to building theoretical foundations for deep learning. Previous work has analyzed structured inputs within the \textit{hidden manifold model}, often under the simplifying assumption of a Gaussian distribution. We extend this framework by modeling inputs as Gaussian mixtures to better represent complex, real-world data. Through empirical and theoretical investigation, we demonstrate that with proper standardization, the learning dynamics converges to the behavior seen in the simple Gaussian case. This finding exhibits a form of universality, where diverse structured distributions yield results consistent with Gaussian assumptions, thereby strengthening the theoretical understanding of deep learning models.

Gaussian Universality in Neural Network Dynamics with Generalized Structured Input Distributions

TL;DR

This work broadens the hidden manifold framework by incorporating general Gaussian mixtures into neural network inputs and shows that, after standardization and under weak correlations, SGD dynamics converge to the Gaussian benchmark predicted by the Gaussian Equivalence Property. The authors establish rigorous results on asymptotic Gaussianity and covariance consistency for block-dependent mixtures and provide thorough numerical evidence, including a Berry–Esseen-based bound and data collapse across dimensions. They further explore the limits of universality under increased input correlation and real-world data, identifying key drivers of dynamic deviations and offering practical guidance for when Gaussian-based analyses remain informative. Overall, the study strengthens the theoretical foundation of deep learning dynamics by revealing a robust form of universality driven by low-order moments rather than exact distributional form.

Abstract

Analyzing neural network dynamics via stochastic gradient descent (SGD) is crucial to building theoretical foundations for deep learning. Previous work has analyzed structured inputs within the \textit{hidden manifold model}, often under the simplifying assumption of a Gaussian distribution. We extend this framework by modeling inputs as Gaussian mixtures to better represent complex, real-world data. Through empirical and theoretical investigation, we demonstrate that with proper standardization, the learning dynamics converges to the behavior seen in the simple Gaussian case. This finding exhibits a form of universality, where diverse structured distributions yield results consistent with Gaussian assumptions, thereby strengthening the theoretical understanding of deep learning models.
Paper Structure (54 sections, 8 theorems, 179 equations, 16 figures, 1 table)

This paper contains 54 sections, 8 theorems, 179 equations, 16 figures, 1 table.

Key Result

Theorem 5.1

. The Berry--Esseen theorem berry1 and berry2; see, e.g., berry3, for a sum of independent, non-identically distributed, zero-mean random variables $X_b$ states that if $\sum_b \mathrm{Var}(X_b) = 1$, then where $\Phi(x)$ is the standard normal CDF and $C_{BE}$ is a universal constant.

Figures (16)

  • Figure 1: Illustration of the hidden manifold model (goldt2020modeling) and our experimental scheme. The term ODE dynamics refers to the outcomes from ordinary differential equation (ODE) simulations, which align with SGD under a simple Gaussian input, since ODE represents the continuous-time limit of SGD under Gaussian input. In contrast, SGD dynamics describes the results obtained from running SGD with $C$ from various distributions. We consider $C$ drawn from a simple Gaussian, from heavy-tailed distributions such as the Lorentz distribution, and from general Gaussian mixtures as described in Sec. \ref{['Method']}
  • Figure 2: Wasserstein--1 distance $W_{1}(\mathcal{P},\mathcal{N})$ for different values of $\alpha$ and number of components $q$, where $\mathcal{P}$ represents the dimension-wise Gaussian mixture distribution.
  • Figure 3: Examples of dynamics under unstandardized Gaussian mixtures with $q=2, q=16$, and $\alpha=0.01, 0.1, 1$, $\beta = 10$. Dynamics of (a) generalization error $\epsilon_g$, (b) covariance matrix $Q$, (c) covariance matrix $R$, and (d) weight of the second layer $v$. The SGD results are averaged over 5 runs. As a baseline, the ODE dynamics (solid line) align perfectly with SGD simulations on simple Gaussian inputs (dotted line). However, a significant divergence occurs between the ODE predictions and the actual SGD dynamics for unstandardized Gaussian mixtures (crosses), demonstrating the failure of the theory in this setting.
  • Figure 4: Examples of dynamics under standardized Gaussian mixtures with $q=2, q=16$, and $\alpha=0.01, 0.1, 1$, $\beta = 10$. Dynamics of (a) generalization error $\epsilon_g$, (b) covariance matrix $Q$, (c) covariance matrix $R$, and (d) weight of the second layer $v$. The SGD results are averaged over 5 runs. The close agreement between the ODE predictions (solid line) and SGD simulations (crosses) demonstrates that standardization effectively restores the universal dynamics. This supports the empirical claim that aligning low-order moments is sufficient to recover Gaussian learning dynamics in these settings.
  • Figure 5: Examples of dynamics under various distribution settings (parameterized in Table \ref{['tab:dist']}) after standardization. Dynamics of (a) generalization error $\epsilon_g$, (b) covariance matrix $Q$, (c) covariance matrix $R$, and (d) weight of the second layer $v$. The SGD results are averaged over 5 runs. The Lorentz (Cauchy) distribution is the sole, crucial exception. Its failure to converge, due to its lack of finite moments, provides strong evidence that the universality is specifically governed by the alignment of the moments.
  • ...and 11 more figures

Theorems & Definitions (8)

  • Theorem 5.1: Uniform Berry--Esseen bounds
  • Theorem 5.2
  • Lemma 5.3: Smooth-test approximation exchange
  • Theorem B.1: Smooth function bound with improved dimension dependence
  • Lemma B.2: Smooth-test approximation
  • Theorem C.1: Berry--Esseen Bound
  • Theorem C.2
  • Lemma C.3