Table of Contents
Fetching ...

Causal Interventions Reveal Shared Structure Across English Filler-Gap Constructions

Sasha Boguraev, Christopher Potts, Kyle Mahowald

TL;DR

The paper investigates whether language models instantiate shared abstract filler--gap representations across English constructions. It employs causal interventions within the Distributed Alignment Search framework to transfer a learned filler--gap mechanism from one construction to others and quantifies transfer with an odds-based metric. Results reveal robust cross-construction generalization, modulated by lexical features like animacy and by construction frequency and similarity, though transfer across clausal boundaries is limited. These findings show that mechanistic analyses of LMs can meaningfully inform linguistic theory and hypotheses about human syntactic processing.

Abstract

Language Models (LMs) have emerged as powerful sources of evidence for linguists seeking to develop theories of syntax. In this paper, we argue that causal interpretability methods, applied to LMs, can greatly enhance the value of such evidence by helping us characterize the abstract mechanisms that LMs learn to use. Our empirical focus is a set of English filler-gap dependency constructions (e.g., questions, relative clauses). Linguistic theories largely agree that these constructions share many properties. Using experiments based in Distributed Interchange Interventions, we show that LMs converge on similar abstract analyses of these constructions. These analyses also reveal previously overlooked factors -- relating to frequency, filler type, and surrounding context -- that could motivate changes to standard linguistic theory. Overall, these results suggest that mechanistic, internal analyses of LMs can push linguistic theory forward.

Causal Interventions Reveal Shared Structure Across English Filler-Gap Constructions

TL;DR

The paper investigates whether language models instantiate shared abstract filler--gap representations across English constructions. It employs causal interventions within the Distributed Alignment Search framework to transfer a learned filler--gap mechanism from one construction to others and quantifies transfer with an odds-based metric. Results reveal robust cross-construction generalization, modulated by lexical features like animacy and by construction frequency and similarity, though transfer across clausal boundaries is limited. These findings show that mechanistic analyses of LMs can meaningfully inform linguistic theory and hypotheses about human syntactic processing.

Abstract

Language Models (LMs) have emerged as powerful sources of evidence for linguists seeking to develop theories of syntax. In this paper, we argue that causal interpretability methods, applied to LMs, can greatly enhance the value of such evidence by helping us characterize the abstract mechanisms that LMs learn to use. Our empirical focus is a set of English filler-gap dependency constructions (e.g., questions, relative clauses). Linguistic theories largely agree that these constructions share many properties. Using experiments based in Distributed Interchange Interventions, we show that LMs converge on similar abstract analyses of these constructions. These analyses also reveal previously overlooked factors -- relating to frequency, filler type, and surrounding context -- that could motivate changes to standard linguistic theory. Overall, these results suggest that mechanistic, internal analyses of LMs can push linguistic theory forward.

Paper Structure

This paper contains 39 sections, 3 equations, 21 figures, 10 tables.

Figures (21)

  • Figure 1: Causal intervention overview. Here, we illustrate our methodology when we intervene within a class, transferring an embedded wh- filler--gap structure into a corresponding minimal pair that didn't previously have one. We then show intervening across classes, inserting a wh- filler--gap into a gap-less cleft sentence.
  • Figure 2: Average normalized max odds across positions, $\pm 1$ standard error. Corresponding multi-clause plots can be found in Appendix \ref{['app:exp1-supp']}. Note that normalization fixes the "Same Animacy, In Train Set" condition at 1.00.
  • Figure 3: For each source construction, we measure the odds at each position--layer pair, aggregating the values by evaluation group. Corresponding plots with control values and multi-clause variants are in Appendix \ref{['app:exp1-supp']}.
  • Figure 4: Top: Generalization network at single-clause the position with edge-threshold of 1. Node size proportional to in-degree; edge size and color proportional to odds of the source construction's interventions measured on the target construction. Bottom: In- and out-degree centrality AUCs against construction frequency.
  • Figure 5: Constructions plotted along the top two principal components at each position in our single-clause variants. Generally, constructions cluster in linguistically intuitive ways -- e.g. animate/inanimate pairs generally cluster, constructions with wh-fillers cluster at the filler position, and restrictive relative clauses typically lie away from the other analyzed constructions.
  • ...and 16 more figures