Table of Contents
Fetching ...

The LLM Language Network: A Neuroscientific Approach for Identifying Causally Task-Relevant Units

Badr AlKhamissi, Greta Tuckute, Antoine Bosselut, Martin Schrimpf

TL;DR

This work applies neuroscience-inspired localizers to 18 LLMs to identify language-selective units, then demonstrates their causal role via targeted ablations that severely impair language tasks. It further shows that these language-selective units align more closely with human brain language regions than random units, establishing a functional cortex-like organization in LLMs. While language specialization is robust across models, attempts to localize MD and ToM networks yield model-dependent results, suggesting partial or conditional mapping to current LLM architectures. Collectively, the findings reveal parallelism between artificial language networks and the brain’s organization, and highlight areas for future multimodal and cross-domain exploration.

Abstract

Large language models (LLMs) exhibit remarkable capabilities on not just language tasks, but also various tasks that are not linguistic in nature, such as logical reasoning and social inference. In the human brain, neuroscience has identified a core language system that selectively and causally supports language processing. We here ask whether similar specialization for language emerges in LLMs. We identify language-selective units within 18 popular LLMs, using the same localization approach that is used in neuroscience. We then establish the causal role of these units by demonstrating that ablating LLM language-selective units -- but not random units -- leads to drastic deficits in language tasks. Correspondingly, language-selective LLM units are more aligned to brain recordings from the human language system than random units. Finally, we investigate whether our localization method extends to other cognitive domains: while we find specialized networks in some LLMs for reasoning and social capabilities, there are substantial differences among models. These findings provide functional and causal evidence for specialization in large language models, and highlight parallels with the functional organization in the brain.

The LLM Language Network: A Neuroscientific Approach for Identifying Causally Task-Relevant Units

TL;DR

This work applies neuroscience-inspired localizers to 18 LLMs to identify language-selective units, then demonstrates their causal role via targeted ablations that severely impair language tasks. It further shows that these language-selective units align more closely with human brain language regions than random units, establishing a functional cortex-like organization in LLMs. While language specialization is robust across models, attempts to localize MD and ToM networks yield model-dependent results, suggesting partial or conditional mapping to current LLM architectures. Collectively, the findings reveal parallelism between artificial language networks and the brain’s organization, and highlight areas for future multimodal and cross-domain exploration.

Abstract

Large language models (LLMs) exhibit remarkable capabilities on not just language tasks, but also various tasks that are not linguistic in nature, such as logical reasoning and social inference. In the human brain, neuroscience has identified a core language system that selectively and causally supports language processing. We here ask whether similar specialization for language emerges in LLMs. We identify language-selective units within 18 popular LLMs, using the same localization approach that is used in neuroscience. We then establish the causal role of these units by demonstrating that ablating LLM language-selective units -- but not random units -- leads to drastic deficits in language tasks. Correspondingly, language-selective LLM units are more aligned to brain recordings from the human language system than random units. Finally, we investigate whether our localization method extends to other cognitive domains: while we find specialized networks in some LLMs for reasoning and social capabilities, there are substantial differences among models. These findings provide functional and causal evidence for specialization in large language models, and highlight parallels with the functional organization in the brain.

Paper Structure

This paper contains 49 sections, 10 figures, 11 tables.

Figures (10)

  • Figure 1: Identifying Specialized and Causally Task-Relevant Units in LLMs.(1) To identify language-selective units, we compare unit activations in response to language (sentences) versus a matched control condition (lists of non-words), and identify the units that exhibit the strongest selectivity to sentences over non-words. The same method is used in neuroscience to localize the human brain's language network (e.g., Fedorenko2010NewMF). (2) Testing the causal role of the identified language-selective units, we ablate those units as well as a set of random units, and (3) compare the resulting performance drop. Ablating 1% of LLM language units leads to vast language deficits ($p<5^{-238}$) for all models tested. Beyond language, only a few models exhibit specialization for reasoning (n=3, $p<5^{-2}$, Multiple Demand network) and social inferences (n=4, $p<5^{-5}$, Theory of Mind network). Plots averaged across n LLMs each; random control repeated with 3 different seeds.
  • Figure 2: Distribution of Language Units Across Layers. (a) The distribution of the top 1% most language-selective units across layers in a sample of five different models. The models are displayed from top to bottom, with each layer labeled by the percentage of units identified as belonging to the top 1% language-selective units. (b) The language selectivity index for all models in the study (n=18) plotted against the relative depth of the layers.
  • Figure 3: Lesion Studies. The average performance change after ablating the top x% of language-selective units, compared to ablating three random sets of units for each model. Performance is evaluated across 10 models and three language benchmarks: (a) SyntaxGym, (b) BLiMP, and (c) GLUE, with (d) presenting results for individual subtasks within GLUE when ablating the top 1% of language units.
  • Figure 4: Language-Selective Model Units Are Selective for Language and Exhibit Similar Response Profiles as the Language Network in the Brain. Brain ( green) and model ( blue) responses for Univariate Condition-Level Responses. (a) Examples of the four experimental conditions used in this analyses with the '+/-' signs denoting whether the condition contains lexical or syntactic information, respectively. (b) Human language network responses to the four conditions; data from Shain2024a. Brain activity is strongest to S, followed by W and J, and weakest to N. (c) Language-selective unit responses to the four conditions averaged across 10 models and condition samples. (d) Control responses from random units averaged across condition samples and 10 models, with 3 random seeds each.
  • Figure 5: Language Units are Aligned to Brain Data. Raw Pearson correlation between predicted brain activity from the x% of model units and actual brain activity in the human language network across 10 models. The alignment of language-selective units shows significantly greater correlation compared to the average of three sets of randomly selected units when selecting a small subset of units. Error bars represent 95% confidence intervals calculated across models. See Table \ref{['tab:num-units']} for the number of units corresponding to each percentage level per model.
  • ...and 5 more figures