Causal Analysis of Syntactic Agreement Mechanisms in Neural Language Models
Matthew Finlayson, Aaron Mueller, Sebastian Gehrmann, Stuart Shieber, Tal Linzen, Yonatan Belinkov
TL;DR
This work examines how neural language models implement subject-verb agreement by applying causal mediation analysis to Transformer-based architectures. By treating neurons as mediators and using input interventions, the study identifies two distinct agreement mechanisms in GPT-2 and Transformer-XL, and a more unified mechanism in XLNet, with larger models not necessarily producing larger agreement margins. It further shows that the most influential neurons for agreement are shared across similar syntactic structures, and that NIE patterns vary with structure and layer, implying distributed, architecture-dependent syntax representations. These findings advance interpretability by linking mechanistic, neuron-level mediators to syntactic behavior and highlight implications for model design and analysis across architectures.
Abstract
Targeted syntactic evaluations have demonstrated the ability of language models to perform subject-verb agreement given difficult contexts. To elucidate the mechanisms by which the models accomplish this behavior, this study applies causal mediation analysis to pre-trained neural language models. We investigate the magnitude of models' preferences for grammatical inflections, as well as whether neurons process subject-verb agreement similarly across sentences with different syntactic structures. We uncover similarities and differences across architectures and model sizes -- notably, that larger models do not necessarily learn stronger preferences. We also observe two distinct mechanisms for producing subject-verb agreement depending on the syntactic structure of the input sentence. Finally, we find that language models rely on similar sets of neurons when given sentences with similar syntactic structure.
