From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks
Clémentine C. J. Dominé, Nicolas Anguita, Alexandra M. Proca, Lukas Braun, Daniel Kunin, Pedro A. M. Mediano, Andrew M. Saxe
TL;DR
The paper provides an exact analytical treatment of learning dynamics in deep linear networks under lambda-balanced initializations, revealing a controllable spectrum between rich feature learning and lazy kernel-like behavior. By deriving a closed-form solution for the gradient flow of key statistics and the finite-width NTK, it shows how relative layer scaling shapes internal representations, the evolution of singular values, and the NTK, with broad implications for continual, reversal, and transfer learning as well as fine-tuning. The results clarify how initialization interacts with architecture to determine when representations become task-specific or task-agnostic, and offer practical guidance for initializing networks to favor desirable learning regimes. This work advances both theoretical understanding and practical guidance for initialization strategies in machine learning and provides a bridge to neuroscience by connecting regime dynamics with representation learning.
Abstract
Biological and artificial neural networks develop internal representations that enable them to perform complex tasks. In artificial networks, the effectiveness of these models relies on their ability to build task specific representation, a process influenced by interactions among datasets, architectures, initialization strategies, and optimization algorithms. Prior studies highlight that different initializations can place networks in either a lazy regime, where representations remain static, or a rich/feature learning regime, where representations evolve dynamically. Here, we examine how initialization influences learning dynamics in deep linear neural networks, deriving exact solutions for lambda-balanced initializations-defined by the relative scale of weights across layers. These solutions capture the evolution of representations and the Neural Tangent Kernel across the spectrum from the rich to the lazy regimes. Our findings deepen the theoretical understanding of the impact of weight initialization on learning regimes, with implications for continual learning, reversal learning, and transfer learning, relevant to both neuroscience and practical applications.
