Fast Escape, Slow Convergence: Learning Dynamics of Phase Retrieval under Power-Law Data
Guillaume Braun, Bruno Loureiro, Ha Quang Minh, Masaaki Imaizumi
TL;DR
This work analyzes the learning dynamics of nonlinear phase retrieval under anisotropic, power-law spectra and shows that anisotropy induces an infinite hierarchy of coupled summary statistics. By combining a Duhamel representation with a Volterra reduction, the authors reveal a universal three-phase learning trajectory: fast escape from mediocrity, slower convergence of macroscopic statistics, and spectral-tail learning in low-variance directions. They derive explicit MSE scaling laws tied to the spectral exponent $a$, supported by numerical experiments, and provide a rigorous framework bridging nonlinear regression with anisotropic data to neural scaling phenomena. The results illuminate how spectral tails reshape learning dynamics and establish the first rigorous scaling laws for nonlinear regression with anisotropic inputs, offering a potential path to understanding broader learning systems under realistic data spectra.
Abstract
Scaling laws describe how learning performance improves with data, compute, or training time, and have become a central theme in modern deep learning. We study this phenomenon in a canonical nonlinear model: phase retrieval with anisotropic Gaussian inputs whose covariance spectrum follows a power law. Unlike the isotropic case, where dynamics collapse to a two-dimensional system, anisotropy yields a qualitatively new regime in which an infinite hierarchy of coupled equations governs the evolution of the summary statistics. We develop a tractable reduction that reveals a three-phase trajectory: (i) fast escape from low alignment, (ii) slow convergence of the summary statistics, and (iii) spectral-tail learning in low-variance directions. From this decomposition, we derive explicit scaling laws for the mean-squared error, showing how spectral decay dictates convergence times and error curves. Experiments confirm the predicted phases and exponents. These results provide the first rigorous characterization of scaling laws in nonlinear regression with anisotropic data, highlighting how anisotropy reshapes learning dynamics.
