Table of Contents
Fetching ...

Fast Escape, Slow Convergence: Learning Dynamics of Phase Retrieval under Power-Law Data

Guillaume Braun, Bruno Loureiro, Ha Quang Minh, Masaaki Imaizumi

TL;DR

This work analyzes the learning dynamics of nonlinear phase retrieval under anisotropic, power-law spectra and shows that anisotropy induces an infinite hierarchy of coupled summary statistics. By combining a Duhamel representation with a Volterra reduction, the authors reveal a universal three-phase learning trajectory: fast escape from mediocrity, slower convergence of macroscopic statistics, and spectral-tail learning in low-variance directions. They derive explicit MSE scaling laws tied to the spectral exponent $a$, supported by numerical experiments, and provide a rigorous framework bridging nonlinear regression with anisotropic data to neural scaling phenomena. The results illuminate how spectral tails reshape learning dynamics and establish the first rigorous scaling laws for nonlinear regression with anisotropic inputs, offering a potential path to understanding broader learning systems under realistic data spectra.

Abstract

Scaling laws describe how learning performance improves with data, compute, or training time, and have become a central theme in modern deep learning. We study this phenomenon in a canonical nonlinear model: phase retrieval with anisotropic Gaussian inputs whose covariance spectrum follows a power law. Unlike the isotropic case, where dynamics collapse to a two-dimensional system, anisotropy yields a qualitatively new regime in which an infinite hierarchy of coupled equations governs the evolution of the summary statistics. We develop a tractable reduction that reveals a three-phase trajectory: (i) fast escape from low alignment, (ii) slow convergence of the summary statistics, and (iii) spectral-tail learning in low-variance directions. From this decomposition, we derive explicit scaling laws for the mean-squared error, showing how spectral decay dictates convergence times and error curves. Experiments confirm the predicted phases and exponents. These results provide the first rigorous characterization of scaling laws in nonlinear regression with anisotropic data, highlighting how anisotropy reshapes learning dynamics.

Fast Escape, Slow Convergence: Learning Dynamics of Phase Retrieval under Power-Law Data

TL;DR

This work analyzes the learning dynamics of nonlinear phase retrieval under anisotropic, power-law spectra and shows that anisotropy induces an infinite hierarchy of coupled summary statistics. By combining a Duhamel representation with a Volterra reduction, the authors reveal a universal three-phase learning trajectory: fast escape from mediocrity, slower convergence of macroscopic statistics, and spectral-tail learning in low-variance directions. They derive explicit MSE scaling laws tied to the spectral exponent , supported by numerical experiments, and provide a rigorous framework bridging nonlinear regression with anisotropic data to neural scaling phenomena. The results illuminate how spectral tails reshape learning dynamics and establish the first rigorous scaling laws for nonlinear regression with anisotropic inputs, offering a potential path to understanding broader learning systems under realistic data spectra.

Abstract

Scaling laws describe how learning performance improves with data, compute, or training time, and have become a central theme in modern deep learning. We study this phenomenon in a canonical nonlinear model: phase retrieval with anisotropic Gaussian inputs whose covariance spectrum follows a power law. Unlike the isotropic case, where dynamics collapse to a two-dimensional system, anisotropy yields a qualitatively new regime in which an infinite hierarchy of coupled equations governs the evolution of the summary statistics. We develop a tractable reduction that reveals a three-phase trajectory: (i) fast escape from low alignment, (ii) slow convergence of the summary statistics, and (iii) spectral-tail learning in low-variance directions. From this decomposition, we derive explicit scaling laws for the mean-squared error, showing how spectral decay dictates convergence times and error curves. Experiments confirm the predicted phases and exponents. These results provide the first rigorous characterization of scaling laws in nonlinear regression with anisotropic data, highlighting how anisotropy reshapes learning dynamics.

Paper Structure

This paper contains 83 sections, 43 theorems, 259 equations, 6 figures.

Key Result

Proposition 1

Let $x \sim \mathcal{N}(0, Q) \in \mathbb{R}^d$, where $Q \in \mathbb{R}^{d \times d}$ is a symmetric positive definite diagonal matrix. Let $w, w^\star \in \mathbb{R}^d$. The population loss can be rewritten as

Figures (6)

  • Figure 1: Evolution of the MSE during training with online SGD for different spectral exponents a (log-log scale). For $a>1$, convergence is markedly slower than the exponential decay seen in the isotropic case, reflecting the difficulty of learning directions associated with small eigenvalues.
  • Figure 2: Evolution of the MSE, u(t) and s(t) under population gradient descent (log-log scale). Parameters: a=1.25, d=1000, eta=1e-2, T=1e7, epsilon=0.05.
  • Figure 3: Evolution of the correlation $u(t)$ for different exponent $a$ ($d=1000$, $\eta=10^{-3}$).
  • Figure 4: Comparison between MSE obtained from the population dynamic, and the approximated one used to derive scaling law (th.) (log-log scale). Parameters: $d=300, \eta=10^{-3}$.
  • Figure 5: Dynamics of online SGD for different $a$.
  • ...and 1 more figures

Theorems & Definitions (77)

  • Proposition 1
  • Proposition 2
  • Remark 1
  • Remark 2
  • Proposition 3
  • Theorem 1: Phases I and II
  • Remark 3
  • Remark 4
  • Proposition 4
  • Theorem 2: Phase III: spectral-tail learning
  • ...and 67 more