Precise Dynamics of Diagonal Linear Networks: A Unifying Analysis by Dynamical Mean-Field Theory
Sota Nishiyama, Masaaki Imaizumi
TL;DR
This work addresses the challenge of understanding gradient-flow dynamics in diagonal linear networks (DLNs) by developing a unified dynamical mean-field theory (DMFT) framework that reduces high-dimensional learning dynamics to a low-dimensional, self-consistent process. The authors identify distinct dynamical regimes governed by initialization scale, derive fixed-point characterizations that interpolate between minimum-norm and $\ell_1$-biased solutions, and establish a precise trade-off between generalization and convergence speeds. They analyze learning timescales via singular perturbation theory, revealing lazy and rich phases for large initialization and search and descent phases for small initialization, with connections to grokking. A rigorous theory for truncated DLNs is provided, alongside extensive numerical validation on Gaussian and real data, demonstrating the robustness and universality of the DMFT predictions. Overall, the work demonstrates DMFT as a powerful tool for predicting implicit biases and temporal structure in high-dimensional neural dynamics with implications for initialization and training efficiency across architectures.
Abstract
Diagonal linear networks (DLNs) are a tractable model that captures several nontrivial behaviors in neural network training, such as initialization-dependent solutions and incremental learning. These phenomena are typically studied in isolation, leaving the overall dynamics insufficiently understood. In this work, we present a unified analysis of various phenomena in the gradient flow dynamics of DLNs. Using Dynamical Mean-Field Theory (DMFT), we derive a low-dimensional effective process that captures the asymptotic gradient flow dynamics in high dimensions. Analyzing this effective process yields new insights into DLN dynamics, including loss convergence rates and their trade-off with generalization, and systematically reproduces many of the previously observed phenomena. These findings deepen our understanding of DLNs and demonstrate the effectiveness of the DMFT approach in analyzing high-dimensional learning dynamics of neural networks.
