Table of Contents
Fetching ...

Brain-Like Language Processing via a Shallow Untrained Multihead Attention Network

Badr AlKhamissi, Greta Tuckute, Antoine Bosselut, Martin Schrimpf

TL;DR

The paper investigates why untrained, architecturally priors-based language models can align with the human language system (HLS). It introduces SUMA, a shallow untrained two-layer multihead attention encoder, localized to language-selective units, and shows that tokenization strategy and token aggregation are the primary drivers of brain alignment, with a simple recurrence providing further gains. When paired with a trainable decoder, SUMA-based representations yield strong language modeling performance and achieve state-of-the-art alignment with human reading times, suggesting the brain may function as a lightweight feature encoder feeding a downstream decoder. The work highlights the importance of architectural priors over large-scale training for brain alignment and language production, while calling for better brain benchmarks to reliably assess neural correspondences and ceilings.

Abstract

Large Language Models (LLMs) have been shown to be effective models of the human language system, with some models predicting most explainable variance of brain activity in current datasets. Even in untrained models, the representations induced by architectural priors can exhibit reasonable alignment to brain data. In this work, we investigate the key architectural components driving the surprising alignment of untrained models. To estimate LLM-to-brain similarity, we first select language-selective units within an LLM, similar to how neuroscientists identify the language network in the human brain. We then benchmark the brain alignment of these LLM units across five different brain recording datasets. By isolating critical components of the Transformer architecture, we identify tokenization strategy and multihead attention as the two major components driving brain alignment. A simple form of recurrence further improves alignment. We further demonstrate this quantitative brain alignment of our model by reproducing landmark studies in the language neuroscience field, showing that localized model units -- just like language voxels measured empirically in the human brain -- discriminate more reliably between lexical than syntactic differences, and exhibit similar response profiles under the same experimental conditions. Finally, we demonstrate the utility of our model's representations for language modeling, achieving improved sample and parameter efficiency over comparable architectures. Our model's estimates of surprisal sets a new state-of-the-art in the behavioral alignment to human reading times. Taken together, we propose a highly brain- and behaviorally-aligned model that conceptualizes the human language system as an untrained shallow feature encoder, with structural priors, combined with a trained decoder to achieve efficient and performant language processing.

Brain-Like Language Processing via a Shallow Untrained Multihead Attention Network

TL;DR

The paper investigates why untrained, architecturally priors-based language models can align with the human language system (HLS). It introduces SUMA, a shallow untrained two-layer multihead attention encoder, localized to language-selective units, and shows that tokenization strategy and token aggregation are the primary drivers of brain alignment, with a simple recurrence providing further gains. When paired with a trainable decoder, SUMA-based representations yield strong language modeling performance and achieve state-of-the-art alignment with human reading times, suggesting the brain may function as a lightweight feature encoder feeding a downstream decoder. The work highlights the importance of architectural priors over large-scale training for brain alignment and language production, while calling for better brain benchmarks to reliably assess neural correspondences and ceilings.

Abstract

Large Language Models (LLMs) have been shown to be effective models of the human language system, with some models predicting most explainable variance of brain activity in current datasets. Even in untrained models, the representations induced by architectural priors can exhibit reasonable alignment to brain data. In this work, we investigate the key architectural components driving the surprising alignment of untrained models. To estimate LLM-to-brain similarity, we first select language-selective units within an LLM, similar to how neuroscientists identify the language network in the human brain. We then benchmark the brain alignment of these LLM units across five different brain recording datasets. By isolating critical components of the Transformer architecture, we identify tokenization strategy and multihead attention as the two major components driving brain alignment. A simple form of recurrence further improves alignment. We further demonstrate this quantitative brain alignment of our model by reproducing landmark studies in the language neuroscience field, showing that localized model units -- just like language voxels measured empirically in the human brain -- discriminate more reliably between lexical than syntactic differences, and exhibit similar response profiles under the same experimental conditions. Finally, we demonstrate the utility of our model's representations for language modeling, achieving improved sample and parameter efficiency over comparable architectures. Our model's estimates of surprisal sets a new state-of-the-art in the behavioral alignment to human reading times. Taken together, we propose a highly brain- and behaviorally-aligned model that conceptualizes the human language system as an untrained shallow feature encoder, with structural priors, combined with a trained decoder to achieve efficient and performant language processing.
Paper Structure (59 sections, 14 figures, 4 tables)

This paper contains 59 sections, 14 figures, 4 tables.

Figures (14)

  • Figure 1: Evaluating Model Alignment to the Human Language System.(Left) Localization: We select the top-k language selective units in models and brain recordings by contrasting the difference in unit activations between sentences and lists of non-words, following Fedorenko2010NewMF. (Center) Benchmarking: Across five different neural datasets, we measure the alignment between language selective units in models and the human brain. Each model's score is the mean of linear predictivity scores for each of the five datasets. We built our model using the first three benchmarks for validation and additionally report scores on two held-out benchmarks. (Right) Proposed Model: We conceptualize language processing in the human brain as an untrained feature encoder (SUMA) providing representations to a downstream trainable decoder that produces language output.
  • Figure 2: Isolating Critical Components of the Transformer Architecture. All models are untrained, i.e., representations are driven by architectural priors alone. Brain alignment is evaluated via ridge regression using each model's top 4096 language-selective units (Figure \ref{['fig:methods']}) on the three validation datasets. The green dashed line indicates the estimated data reliability (cross-subject consistency). Each experiment is repeated 5 times with different model seed initializations indicated by the error bars. (a) Transformer block with components labeled for the ablation study in (b). The blue dashed box indicates our final model SUMA. (b) Multihead Attention and tokenization strategy drives brain alignment for SUMA. Brain alignment of a single block model with different architectural ablations (labels as in (a)). Attention ( A ) is implemented with 512 attention heads. The virtual depth of the model is two layers. Representations are taken in response to the last token. Mean implies taking the average of all tokens. PosEnc implies using positional encoding. (c) Increasing the number of attention heads increases brain alignment. Base architecture is two layers of T+LN1+A . (d) Recurrent application of weights increases brain alignment. Base architecture is T+LN1+A . Virtual depth is increased by unrolling the same set of weights multiple times in the depth dimension (a simple form of recurrence). "A" refers to adaptive depth relative to the number of tokens measured using $\operatorname{ceiling}(\frac{\textnormal{\# of tokens}}{256})$.
  • Figure 3: Language Models Exhibit Similar Response Profiles as the HLS. Brain ( green) and model ( blue) responses for Univariate Condition-Level Responses (Top Row) and Multivariate Representational Pattern Analysis (Bottom Row). Each untrained model plot is the average across 5 different random model initializations. The error bars are across the different initializations and conditions. (a) Examples of the four experimental conditions used in this analyses with the '+/-' signs denoting whether the condition contains lexical or syntactic information, respectively. (b) Brain responses to the four conditions; data from Shain2023. (c)SUMA responses to the four conditions. Control experiments show the effect of unit localization ("Lang Selective" vs "Random Sampling") and tokenization ("BPE" vs "Word-Based"). (d) The same univariate analysis for GPT2-XL and GPT2-Small models. (e-g) Same as (b-d) but for the multivariate analysis (Section \ref{['sec:hls-response-profiles']}). Brain data from extracted from reported results in Fedorenko2012.
  • Figure 4: Localized Untrained Units Provide Representations Suitable for Language Modeling.(a-b) The WikiText-103 validation loss of SUMA variants and control models during training, when training (a) one Transformer block, and (b) two Transformer blocks. We train two variants of SUMA-based models: one that uses the output of the localized units as the input representation for the downstream decoder, and another that uses the final representation (Final Repr) of the untrained model as input. The baseline model refers to passing the static tokens directly to the decoder without any intermediate architecture. (c) Behavioral alignment to human reading times in Futrell2018 as a function of model complexity and efficiency measured by the number of FLOPs.
  • Figure 5: Isolating Critical Components of the Transformer Architecture. Similar plots as Figure \ref{['fig:ablations']} but aggregating the results across the three metrics on the three validation datasets instead of only using Linear Predictivity. Each experiment is repeated 5 across different random initializations. (a) Transformer block with components labeled for the ablation study in (b). The blue dashed box indicates our final model SUMA. (b) Brain alignment of a two-layer SUMA with shared weights on different architectural ablations and 512 attention heads. We use the model representation of the last token, except for T+Mean . Labels refer to (a). (c) Increasing the number of attention heads increases brain alignment. We use here the T+LN+A architecture with a depth of two. (d) Brain alignment of the T+LN+A model as a function of the number of unrolling steps. Adaptive refers to adaptive depth relative to the number of tokens.
  • ...and 9 more figures