Start Making Sense(s): A Developmental Probe of Attention Specialization Using Lexical Ambiguity
Pamela D. Rivière, Sean Trott
TL;DR
This work investigates how Transformer-style self-attention heads develop specialized roles in lexical disambiguation by tracking developmental trajectories across Pythia checkpoints. It combines psycholinguistic stimuli (RAW-C) with targeted QK perturbations and causal ablations to identify heads that covary with disambiguation performance, and then assesses their robustness to stimulus perturbations. The study finds early developmental milestones where attention to disambiguation cues increases, with larger models showing more robust and generalizable disambiguation heads; ablations confirm causal contributions, especially in smaller models. The results highlight a developmental perspective as a powerful lens to understand contextualization mechanisms and raise questions about generalization across seeds and model scales.
Abstract
Despite an in-principle understanding of self-attention matrix operations in Transformer language models (LMs), it remains unclear precisely how these operations map onto interpretable computations or functions--and how or when individual attention heads develop specialized attention patterns. Here, we present a pipeline to systematically probe attention mechanisms, and we illustrate its value by leveraging lexical ambiguity--where a single word has multiple meanings--to isolate attention mechanisms that contribute to word sense disambiguation. We take a "developmental" approach: first, using publicly available Pythia LM checkpoints, we identify inflection points in disambiguation performance for each LM in the suite; in 14M and 410M, we identify heads whose attention to disambiguating words covaries with overall disambiguation performance across development. We then stress-test the robustness of these heads to stimulus perturbations: in 14M, we find limited robustness, but in 410M, we identify multiple heads with surprisingly generalizable behavior. Then, in a causal analysis, we find that ablating the target heads demonstrably impairs disambiguation performance, particularly in 14M. We additionally reproduce developmental analyses of 14M across all of its random seeds. Together, these results suggest: that disambiguation benefits from a constellation of mechanisms, some of which (especially in 14M) are highly sensitive to the position and part-of-speech of the disambiguating cue; and that larger models (410M) may contain heads with more robust disambiguation behavior. They also join a growing body of work that highlights the value of adopting a developmental perspective when probing LM mechanisms.
