Loss Landscape Degeneracy and Stagewise Development in Transformers
Jesse Hoogland, George Wang, Matthew Farrugia-Roberts, Liam Carroll, Susan Wei, Daniel Murfet
TL;DR
The paper probes how degeneracy in the local loss landscape—quantified by the local learning coefficient (LLC) from singular learning theory—tracks stagewise development in transformers. By monitoring LLC during training of language-model and in-context linear regression transformers, the authors identify plateaus in LLC that delineate developmental stages, each coinciding with interpretable shifts in internal structure (e.g., bigram/n-gram learning, positional information use, induction circuit formation) and input/output behavior (including in-context learning). They contrast LLC-based stage detection with Hessian-based metrics, showing LLC captures multiple stage boundaries that curvature metrics miss, and provide methodological details on SGLD-based LLC estimation and local Bayesian free-energy reasoning. The findings suggest degeneracy as a unifying, setting-agnostic lens for understanding how modern deep networks develop, with implications for developmental interpretability and mechanistic insight into transformer computation. Overall, the work offers empirical evidence that loss landscape degeneracy is closely linked to the emergence of higher-level computational structures in transformers, paving the way for more principled, degeneracy-driven analyses of deep learning development.
Abstract
Deep learning involves navigating a high-dimensional loss landscape over the neural network parameter space. Over the course of training, complex computational structures form and re-form inside the neural network, leading to shifts in input/output behavior. It is a priority for the science of deep learning to uncover principles governing the development of neural network structure and behavior. Drawing on the framework of singular learning theory, we propose that model development is deeply linked to degeneracy in the local geometry of the loss landscape. We investigate this link by monitoring loss landscape degeneracy throughout training, as quantified by the local learning coefficient, for a transformer language model and an in-context linear regression transformer. We show that training can be divided into distinct periods of change in loss landscape degeneracy, and that these changes in degeneracy coincide with significant changes in the internal computational structure and the input/output behavior of the transformers. This finding provides suggestive evidence that degeneracy and development are linked in transformers, underscoring the potential of a degeneracy-based perspective for understanding modern deep learning.
