The Local Learning Coefficient: A Singularity-Aware Complexity Measure
Edmund Lau, Zach Furman, George Wang, Daniel Murfet, Susan Wei
TL;DR
The paper addresses the inadequacy of traditional, parameter-count-based complexity measures for deep neural networks due to singularities in the loss landscape. It introduces the Local Learning Coefficient (LLC), a singularity-aware complexity measure grounded in Singular Learning Theory, and derives a local volume scaling $V(\epsilon) \propto \epsilon^{\lambda(w^*)} (-\log \epsilon)^{m(w^*)-1}$ that governs the near-minimum geometry, as well as a local free energy expansion $F_n(B_\gamma(w^*)) = n L_n(w^*) + \lambda(w^*) \log n + o_p(\log \log n)$. The authors develop a scalable LLC estimator through a practical surrogate with a localizing prior and a SGLD-based sampling scheme, and validate it empirically on deep linear networks up to $10^8$ parameters, ResNet models on CIFAR-10, and transformer language models, showing training heuristics meaningfully modulate LLC. They demonstrate that LLC decreases with stronger implicit regularization and can reveal the effective simplicity of trained networks, offering a principled framework to reconcile deep networks' apparent complexity with parsimony. The work provides a novel, data-distribution-sensitive lens for model selection and understanding training dynamics, with potential to illuminate phase transitions and emergent abilities in large-scale models.
Abstract
The Local Learning Coefficient (LLC) is introduced as a novel complexity measure for deep neural networks (DNNs). Recognizing the limitations of traditional complexity measures, the LLC leverages Singular Learning Theory (SLT), which has long recognized the significance of singularities in the loss landscape geometry. This paper provides an extensive exploration of the LLC's theoretical underpinnings, offering both a clear definition and intuitive insights into its application. Moreover, we propose a new scalable estimator for the LLC, which is then effectively applied across diverse architectures including deep linear networks up to 100M parameters, ResNet image models, and transformer language models. Empirical evidence suggests that the LLC provides valuable insights into how training heuristics might influence the effective complexity of DNNs. Ultimately, the LLC emerges as a crucial tool for reconciling the apparent contradiction between deep learning's complexity and the principle of parsimony.
