Table of Contents
Fetching ...

fMRI predictors based on language models of increasing complexity recover brain left lateralization

Laurent Bonnasse-Gahot, Christophe Pallier

TL;DR

Analysis of an fMRI dataset where the complexity of large language models is manipulated, observing that the performance of models in predicting brain responses follows a scaling law, where the fit with brain activity increases linearly with the logarithm of the number of parameters of the model.

Abstract

Over the past decade, studies of naturalistic language processing where participants are scanned while listening to continuous text have flourished. Using word embeddings at first, then large language models, researchers have created encoding models to analyze the brain signals. Presenting these models with the same text as the participants allows to identify brain areas where there is a significant correlation between the functional magnetic resonance imaging (fMRI) time series and the ones predicted by the models' artificial neurons. One intriguing finding from these studies is that they have revealed highly symmetric bilateral activation patterns, somewhat at odds with the well-known left lateralization of language processing. Here, we report analyses of an fMRI dataset where we manipulate the complexity of large language models, testing 28 pretrained models from 8 different families, ranging from 124M to 14.2B parameters. First, we observe that the performance of models in predicting brain responses follows a scaling law, where the fit with brain activity increases linearly with the logarithm of the number of parameters of the model (and its performance on natural language processing tasks). Second, although this effect is present in both hemispheres, it is stronger in the left than in the right hemisphere. Specifically, the left-right difference in brain correlation follows a scaling law with the number of parameters. This finding reconciles computational analyses of brain activity using large language models with the classic observation from aphasic patients showing left hemisphere dominance for language.

fMRI predictors based on language models of increasing complexity recover brain left lateralization

TL;DR

Analysis of an fMRI dataset where the complexity of large language models is manipulated, observing that the performance of models in predicting brain responses follows a scaling law, where the fit with brain activity increases linearly with the logarithm of the number of parameters of the model.

Abstract

Over the past decade, studies of naturalistic language processing where participants are scanned while listening to continuous text have flourished. Using word embeddings at first, then large language models, researchers have created encoding models to analyze the brain signals. Presenting these models with the same text as the participants allows to identify brain areas where there is a significant correlation between the functional magnetic resonance imaging (fMRI) time series and the ones predicted by the models' artificial neurons. One intriguing finding from these studies is that they have revealed highly symmetric bilateral activation patterns, somewhat at odds with the well-known left lateralization of language processing. Here, we report analyses of an fMRI dataset where we manipulate the complexity of large language models, testing 28 pretrained models from 8 different families, ranging from 124M to 14.2B parameters. First, we observe that the performance of models in predicting brain responses follows a scaling law, where the fit with brain activity increases linearly with the logarithm of the number of parameters of the model (and its performance on natural language processing tasks). Second, although this effect is present in both hemispheres, it is stronger in the left than in the right hemisphere. Specifically, the left-right difference in brain correlation follows a scaling law with the number of parameters. This finding reconciles computational analyses of brain activity using large language models with the classic observation from aphasic patients showing left hemisphere dominance for language.
Paper Structure (29 sections, 20 figures, 2 tables)

This paper contains 29 sections, 20 figures, 2 tables.

Figures (20)

  • Figure 1: Inter-subjects reliable voxels. (a) Distribution of inter-subjects correlations over voxels, computed using two subgroups of subjects and predicting one subgroup average fMRI time-course from the other one (see main text for details). Voxels with a brain correlation above the dotted vertical line represent the 25% voxels with the largest correlations. (b) Glass brain representation of this inter-subjects reliability measure. Hot colors and dotted line show the 25% most reliable voxels.
  • Figure 2: Performance of various models in predicting fMRI brain time-courses. (a) Density estimates of the distributions of $r$-scores obtained for all 28 large language models, the random baselines and GloVe. The densities are scaled to have the same maximum (b) Average $r$-score as a function of the number of parameters of the model, in log scale. Here and in the next figures, the shaded area indicates the 95% confidence interval of the slope, computed with bootstrap. (c) Same, split by models' family.
  • Figure 3: Brain correlation maps associated with the smallest (a) and the largest model (b). These maps show the increase in $r$-score relative to the model using the random embedding baseline 1024d.
  • Figure 4: Voxel-wise strength of the relationship between models' size and their predictive power. The slopes of the linear regression between $r$-score and the logarithm of the number of parameters are presented on a glass brain view. For readability, only voxels with $p$-values smaller than $10^{-7}$ are shown.
  • Figure 5: Brain correlation and performance on natural language tasks. (a) Brain correlation on the 25% most reliable voxels for all 28 large language models as of a function of perplexity on Wikitext-2 test set. Note that the x-axis is inverted, as the lower the perplexity the better the model. (b) Same with performance on the Hellaswag benchmark. The higher the better. (c) Same as Fig. \ref{['fig:scaling_law_rv']}b and (a) and (b), but focusing on the 10 largest models, with a number of parameters above 3B. See Fig. \ref{['fig:other_measures']} for similar plots on the whole brain volume.
  • ...and 15 more figures