Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs
Angelica Chen, Ravid Shwartz-Ziv, Kyunghyun Cho, Matthew L. Leavitt, Naomi Saphra
TL;DR
This paper examines how grammatical capabilities emerge during masked language model pretraining by tracking Syntactic Attention Structure (SAS). It identifies a brief structure onset where SAS spikes in tandem with a steep loss drop, followed by a capabilities onset where grammatical performance improves, suggesting a causal link from internal syntactic representations to external language competence. Through a syntactic regularizer, the authors demonstrate SAS is necessary for complex grammar but can be in competition with an alternative strategy, and that brief early suppression can accelerate learning but may hinder long-term performance if mis-timed. The study frames these dynamics within phase-transition and simplicity-bias theories, offering causal evidence from training interventions and highlighting implications for optimization, curriculum design, and interpretability in neural NLP models.
Abstract
Most interpretability research in NLP focuses on understanding the behavior and features of a fully trained model. However, certain insights into model behavior may only be accessible by observing the trajectory of the training process. We present a case study of syntax acquisition in masked language models (MLMs) that demonstrates how analyzing the evolution of interpretable artifacts throughout training deepens our understanding of emergent behavior. In particular, we study Syntactic Attention Structure (SAS), a naturally emerging property of MLMs wherein specific Transformer heads tend to focus on specific syntactic relations. We identify a brief window in pretraining when models abruptly acquire SAS, concurrent with a steep drop in loss. This breakthrough precipitates the subsequent acquisition of linguistic capabilities. We then examine the causal role of SAS by manipulating SAS during training, and demonstrate that SAS is necessary for the development of grammatical capabilities. We further find that SAS competes with other beneficial traits during training, and that briefly suppressing SAS improves model quality. These findings offer an interpretation of a real-world example of both simplicity bias and breakthrough training dynamics.
