Open Sesame: Getting Inside BERT's Linguistic Knowledge
Yongjie Lin, Yi Chern Tan, Robert Frank
TL;DR
The paper investigates how BERT encodes syntactic structure, distinguishing positional/linear information from hierarchical information. It combines a poverty-of-the-stimulus diagnostic framework with a novel attention-based confusion score to quantify how self-attention encodes dependencies like subject–verb agreement and reflexive anaphora. Across two BERT variants, the study shows that lower layers preserve positional cues while higher layers encode progressively more abstract hierarchical features, though human-like sensitivity to certain dependencies remains imperfect. The results demonstrate that BERT builds increasingly abstract syntactic representations through attention mechanisms, which helps explain its strong performance on structure-sensitive NLP tasks while highlighting areas needing further exploration. Overall, the work provides a concrete methodology for dissecting linguistic knowledge in transformer models and contributes nuanced insights into how hierarchical structure is instantiated in self-attention layers.
Abstract
How and to what extent does BERT encode syntactically-sensitive hierarchical information or positionally-sensitive linear information? Recent work has shown that contextual representations like BERT perform well on tasks that require sensitivity to linguistic structure. We present here two studies which aim to provide a better understanding of the nature of BERT's representations. The first of these focuses on the identification of structurally-defined elements using diagnostic classifiers, while the second explores BERT's representation of subject-verb agreement and anaphor-antecedent dependencies through a quantitative assessment of self-attention vectors. In both cases, we find that BERT encodes positional information about word tokens well on its lower layers, but switches to a hierarchically-oriented encoding on higher layers. We conclude then that BERT's representations do indeed model linguistically relevant aspects of hierarchical structure, though they do not appear to show the sharp sensitivity to hierarchical structure that is found in human processing of reflexive anaphora.
