Causal Interventions Reveal Shared Structure Across English Filler-Gap Constructions
Sasha Boguraev, Christopher Potts, Kyle Mahowald
TL;DR
The paper investigates whether language models instantiate shared abstract filler--gap representations across English constructions. It employs causal interventions within the Distributed Alignment Search framework to transfer a learned filler--gap mechanism from one construction to others and quantifies transfer with an odds-based metric. Results reveal robust cross-construction generalization, modulated by lexical features like animacy and by construction frequency and similarity, though transfer across clausal boundaries is limited. These findings show that mechanistic analyses of LMs can meaningfully inform linguistic theory and hypotheses about human syntactic processing.
Abstract
Language Models (LMs) have emerged as powerful sources of evidence for linguists seeking to develop theories of syntax. In this paper, we argue that causal interpretability methods, applied to LMs, can greatly enhance the value of such evidence by helping us characterize the abstract mechanisms that LMs learn to use. Our empirical focus is a set of English filler-gap dependency constructions (e.g., questions, relative clauses). Linguistic theories largely agree that these constructions share many properties. Using experiments based in Distributed Interchange Interventions, we show that LMs converge on similar abstract analyses of these constructions. These analyses also reveal previously overlooked factors -- relating to frequency, filler type, and surrounding context -- that could motivate changes to standard linguistic theory. Overall, these results suggest that mechanistic, internal analyses of LMs can push linguistic theory forward.
