Table of Contents
Fetching ...

Different types of syntactic agreement recruit the same units within large language models

Daria Kryvosheieva, Andrea de Varda, Evelina Fedorenko, Greta Tuckute

TL;DR

This study probes how grammatical knowledge is represented inside large language models by applying a neuroscience-inspired functional localization approach to identify syntax-responsive units for 67 English syntactic phenomena across seven models. It reveals that different types of syntactic agreement recruit overlapping unit sets, suggesting a shared functional substrate for agreement, and shows this pattern generalizes to Russian and Chinese and across 57 languages in a cross-linguistic analysis. The work further demonstrates a causal role for these units via targeted ablations and shows that unit overlap scales with linguistic similarity, implying structured organization of syntactic representations that transcends individual phenomena yet remains language-aware. The findings challenge the idea of purely generic syntax-processing units and point to a nuanced, shared but language-tuned architecture for syntactic computation in LLMs, with broad implications for cognitive science and multilingual NLP research.

Abstract

Large language models (LLMs) can reliably distinguish grammatical from ungrammatical sentences, but how grammatical knowledge is represented within the models remains an open question. We investigate whether different syntactic phenomena recruit shared or distinct components in LLMs. Using a functional localization approach inspired by cognitive neuroscience, we identify the LLM units most responsive to 67 English syntactic phenomena in seven open-weight models. These units are consistently recruited across sentences containing the phenomena and causally support the models' syntactic performance. Critically, different types of syntactic agreement (e.g., subject-verb, anaphor, determiner-noun) recruit overlapping sets of units, suggesting that agreement constitutes a meaningful functional category for LLMs. This pattern holds in English, Russian, and Chinese; and further, in a cross-lingual analysis of 57 diverse languages, structurally more similar languages share more units for subject-verb agreement. Taken together, these findings reveal that syntactic agreement-a critical marker of syntactic dependencies-constitutes a meaningful category within LLMs' representational spaces.

Different types of syntactic agreement recruit the same units within large language models

TL;DR

This study probes how grammatical knowledge is represented inside large language models by applying a neuroscience-inspired functional localization approach to identify syntax-responsive units for 67 English syntactic phenomena across seven models. It reveals that different types of syntactic agreement recruit overlapping unit sets, suggesting a shared functional substrate for agreement, and shows this pattern generalizes to Russian and Chinese and across 57 languages in a cross-linguistic analysis. The work further demonstrates a causal role for these units via targeted ablations and shows that unit overlap scales with linguistic similarity, implying structured organization of syntactic representations that transcends individual phenomena yet remains language-aware. The findings challenge the idea of purely generic syntax-processing units and point to a nuanced, shared but language-tuned architecture for syntactic computation in LLMs, with broad implications for cognitive science and multilingual NLP research.

Abstract

Large language models (LLMs) can reliably distinguish grammatical from ungrammatical sentences, but how grammatical knowledge is represented within the models remains an open question. We investigate whether different syntactic phenomena recruit shared or distinct components in LLMs. Using a functional localization approach inspired by cognitive neuroscience, we identify the LLM units most responsive to 67 English syntactic phenomena in seven open-weight models. These units are consistently recruited across sentences containing the phenomena and causally support the models' syntactic performance. Critically, different types of syntactic agreement (e.g., subject-verb, anaphor, determiner-noun) recruit overlapping sets of units, suggesting that agreement constitutes a meaningful functional category for LLMs. This pattern holds in English, Russian, and Chinese; and further, in a cross-lingual analysis of 57 diverse languages, structurally more similar languages share more units for subject-verb agreement. Taken together, these findings reveal that syntactic agreement-a critical marker of syntactic dependencies-constitutes a meaningful category within LLMs' representational spaces.

Paper Structure

This paper contains 21 sections, 22 figures, 1 table.

Figures (22)

  • Figure 1: Consistency of LLM units engaged across sentence instances for each syntactic phenomenon. Bars show overlap between unit sets identified in two independent halves of the data (2-fold cross-validation) for each of the 67 BLiMP phenomena, averaged across seven models. Bars are sorted from highest to lowest overlap (with percent overlap shown on the right); bar colors denote the category groupings in BLiMP and dots show individual models. Gray bars show two control conditions: Random (analytical expected overlap) and BLiMP-Control (randomized condition labels applied to grammatical sentences).
  • Figure 2: Performance impact of syntax-responsive unit ablation. Each colored bar shows the average difference in accuracy between the top-unit-ablated (1%) and unablated models for each of the 67 BLiMP phenomena, averaged across seven models. Bars are sorted from highest to lowest top-unit ablation performance drop. Gray bars show accuracy differences from ablating a random 1% of units (averaged across seven models and four random seeds). Error bars denote 95% confidence intervals over models.
  • Figure 3: Unit overlaps for all 2,211 pairs of BLiMP phenomena, averaged across seven models. Blue bars indicate unit overlaps for pairs of phenomena within the same syntactic category; orange bars indicate overlaps for pairs of phenomena across categories. The inset shows the pairs with the highest overlaps.
  • Figure 4: Within-category and cross-category overlaps in BLiMP. Each blue bar shows the average (across all pairs of phenomena within a category) intersection of localized unit sets (as a percentage out of the 1% target set; averaged across models). Each corresponding orange bar shows the average intersection across all pairs of phenomena belonging to distinct categories. The categories are ordered by the difference between the within-category and cross-category average overlap. Markers denote individual models.
  • Figure 5: Overlap of localized units within and across agreement categories. Bars show average overlap (as a percentage of top-1% units; averaged across models) between pairs of phenomena: (i) within each agreement category (blue; same data as in Figure \ref{['fig:cross-overlap']}), (ii) across agreement categories (orange), and (iii) between agreement and non-agreement phenomena (gray).
  • ...and 17 more figures