Table of Contents
Fetching ...
Paper

The Operator Origins of Neural Scaling Laws: A Generalized Spectral Transport Dynamics of Deep Learning

Abstract

Modern deep networks operate in a rough, finite-regularity regime where Jacobian-induced operators exhibit heavy-tailed spectra and strong basis drift. In this work, we derive a unified operator-theoretoretic description of neural training dynamics directly from gradient descent. Starting from the exact evolution in function space, we apply Kato perturbation theory to obtain a rigorous system of coupled mode ODEs and show that, after coarse-graining, these dynamics converge to a spectral transport--dissipation PDE where captures eigenbasis drift and encodes nonlocal spectral coupling. We prove that neural training preserves functional regularity, forcing the drift to take an asymptotic power-law form . In the weak-coupling regime -- naturally induced by spectral locality and SGD noise -- the PDE admits self-similar solutions with a resolution frontier, polynomial amplitude growth, and power-law dissipation. This structure yields explicit scaling-law exponents, explains the geometry of double descent, and shows that the effective training time satisfies for slowly varying . Finally, we show that NTK training and feature learning arise as two limits of the same PDE: recovers lazy dynamics, while produces representation drift. Our results provide a unified spectral framework connecting operator geometry, optimization dynamics, and the universal scaling behavior of modern deep networks.