Contextual Feature Extraction Hierarchies Converge in Large Language Models and the Brain

Gavin Mischler; Yinghao Aaron Li; Stephan Bickel; Ashesh D. Mehta; Nima Mesgarani

Contextual Feature Extraction Hierarchies Converge in Large Language Models and the Brain

Gavin Mischler, Yinghao Aaron Li, Stephan Bickel, Ashesh D. Mehta, Nima Mesgarani

TL;DR

This study investigates why large language models converge toward brain-like language processing. It analyzes 12 open-source LLMs (~7B parameters) with intracranial EEG recordings from human auditory and language areas while subjects listened to speech, mapping LLM embeddings to neural responses via ridge regression. The study finds that higher-performing LLMs produce stronger brain predictability and exhibit brain-like hierarchical processing, with peak alignment in middle layers and reliance on longer contextual information. These results imply that brain-like representations emerge from hierarchical feature extraction and contextual processing, suggesting brain-inspired principles could guide the development of more cognitively aligned AI and progress toward artificial general intelligence.

Abstract

Recent advancements in artificial intelligence have sparked interest in the parallels between large language models (LLMs) and human neural processing, particularly in language comprehension. While prior research has established similarities in the representation of LLMs and the brain, the underlying computational principles that cause this convergence, especially in the context of evolving LLMs, remain elusive. Here, we examined a diverse selection of high-performance LLMs with similar parameter sizes to investigate the factors contributing to their alignment with the brain's language processing mechanisms. We find that as LLMs achieve higher performance on benchmark tasks, they not only become more brain-like as measured by higher performance when predicting neural responses from LLM embeddings, but also their hierarchical feature extraction pathways map more closely onto the brain's while using fewer layers to do the same encoding. We also compare the feature extraction pathways of the LLMs to each other and identify new ways in which high-performing models have converged toward similar hierarchical processing mechanisms. Finally, we show the importance of contextual information in improving model performance and brain similarity. Our findings reveal the converging aspects of language processing in the brain and LLMs and offer new directions for developing models that align more closely with human cognitive processing.

Contextual Feature Extraction Hierarchies Converge in Large Language Models and the Brain

TL;DR

Abstract

Paper Structure (17 sections, 9 figures, 1 table)

This paper contains 17 sections, 9 figures, 1 table.

Introduction
Results
Brain Similarity of Large Language Models
Alignment of Language Processing Hierarchies Between Models and the Brain
Contextual Content Supports Brain Hierarchy Alignment
Discussion
Hierarchical Processing and Inter-Model Comparisons
Feature Extraction Efficiency and Contextual Processing
Convergence to Brain-Like Models for Human-Level Artificial General Intelligence
Methods
Human Intracranial Recordings
Large Language Models
Ridge Regression Mapping from Embeddings to Neural Responses
Electrode Localization and Brain Plotting
Comparing LLMs with Centered Kernel Alignment
...and 2 more sections

Figures (9)

Figure 1: Mapping LLM embeddings to the brain. Speech responsive electrodes are shown on an inflated brain (shaded by their responsiveness t-value from a paired t-test between speech and silence). As subjects listened to speech, the average neural response in a $100$ms window around a word center was used as a given electrode's word response. The same text was fed to an LLM and the embeddings from all $32$ layers were extracted. Ridge regression was used to predict the word responses from the LLM representations, producing a brain correlation score for each electrode-layer pair.
Figure 1: Subject-wise electrode localization. Electrodes are plotted on the inflated Freesurfer average brain and are colored by their corresponding subject identity.
Figure 2: Peak brain correlations and layers relate to LLM performance. A) Average brain correlation over all electrodes for each LLM. LLMs are colored in order of their separately-measured benchmark performance, with blue/purple models performing the worst and yellow models performing the best. Shaded regions indicate standard error of the mean over electrodes. B) The peak correlation over all layers of a given model was computed for each electrode, then averaged over all electrodes. Bars indicate standard error of the mean over electrodes. Average peak correlation score is significantly related to LLM performance (Pearson $r=0.92, p=2.24\times10^{-5}$). Stars indicate statistical significance level thresholds of $p<0.05$, $p<0.01$, and $p<0.001$ with *, **, and ***, respectively. C) The peak scoring layer of each model was computed for each electrode. Then electrodes were sorted by distance from pmHG and a sliding window average (centered, $n=50$) was taken across the electrodes of each model to compute the smoothed, local estimate of the most brain-like LLM layer. The peak scoring layer generally increases with distance from pmHG, and the better models (yellow) peak at lower layers compared to the worse models (blue/purple). D) The average peak layer for a given model over all electrodes is shown with bars indicating standard error of the mean. Average peak layer is significantly negatively related to LLM performance (Pearson $r=-0.81, p=0.0013$).
Figure 2: Effect of regression hyperparameters on scores. The left plot shows the pairwise effects on the peak brain similarity scores when altering the number of principal components of the LLM embeddings used for computing scores with ridge regression, keeping a $100$ms window size constant. The right plot shows the pairwise effects of altering the width of the averaging window around word centers for estimating neural responses to words, keeping the PCA dimensionality of $500$ constant. Along each plot’s diagonal is the marginal distribution for that hyperparameter setting. The off-diagonal plots display scatter plots of all the peak-scores for all models together for one hyperparameter setting against another. Each dot represents the peak brain correlation score for one model-electrode pair. All pairs of settings produce scores which are highly correlated, as written in each subplot (Pearson correlation, *** indicates $p<0.001$).
Figure 3: Better LLMs display more brain-like hierarchical processing. A) Examples of computing the brain hierarchy alignment are shown for two models: XwinLM (the model with the highest alignment score) and Galactica (the model with the lowest alignment score). Electrodes were first binned into a hierarchy by distance from pmHG. Within a bin, the correlations over all 32 layers were normalized between 0 and 1 and then averaged over electrodes in the bin, producing one row for each bin in the matrix on the left. The center of mass (C.o.M.) of the distribution of brain similarity scores over LLM layers for each bin was computed and plotted in the scatter plot to the right. The brain hierarchy alignment score was then computed as the Pearson correlation between LLM layer C.o.M. and distance from pmHG. B) A scatter plot of brain hierarchy alignment scores and LLM performance shows a significant positive correlation (Pearson $r=0.79, p=0.0021$, ** indicates $p<0.01$). Line and shaded region shows linear regression fit and bootstrapped $(n=1000)$$95\%$ confidence interval.
...and 4 more figures

Contextual Feature Extraction Hierarchies Converge in Large Language Models and the Brain

TL;DR

Abstract

Contextual Feature Extraction Hierarchies Converge in Large Language Models and the Brain

Authors

TL;DR

Abstract

Table of Contents

Figures (9)