Divergences between Language Models and Human Brains

Yuchen Zhou; Emmy Liu; Graham Neubig; Michael J. Tarr; Leila Wehbe

Divergences between Language Models and Human Brains

Yuchen Zhou, Emmy Liu, Graham Neubig, Michael J. Tarr, Leila Wehbe

TL;DR

The paper investigates how language-model representations diverge from human brain responses during language processing, using MEG data from reading and listening to narratives. It introduces a data-driven pipeline that encodes MEG signals with LM embeddings via ridge regression, identifies divergences through an automatic hypothesis proposer, and validates two core phenomena: social/emotional intelligence and physical commonsense. Behavioral experiments corroborate these findings, and domain-specific fine-tuning on Social IQa and PiQA improves brain alignment within language-processing time windows. The work highlights concrete gaps in LM representations and shows that targeted fine-tuning can bridge some of these gaps, though it remains limited by the scope of narratives examined and points to broader datasets for future exploration.

Abstract

Do machines and humans process language in similar ways? Recent research has hinted at the affirmative, showing that human neural activity can be effectively predicted using the internal representations of language models (LMs). Although such results are thought to reflect shared computational principles between LMs and human brains, there are also clear differences in how LMs and humans represent and use language. In this work, we systematically explore the divergences between human and machine language processing by examining the differences between LM representations and human brain responses to language as measured by Magnetoencephalography (MEG) across two datasets in which subjects read and listened to narrative stories. Using an LLM-based data-driven approach, we identify two domains that LMs do not capture well: social/emotional intelligence and physical commonsense. We validate these findings with human behavioral experiments and hypothesize that the gap is due to insufficient representations of social/emotional and physical knowledge in LMs. Our results show that fine-tuning LMs on these domains can improve their alignment with human brain responses.

Divergences between Language Models and Human Brains

TL;DR

Abstract

Paper Structure (46 sections, 9 equations, 13 figures, 8 tables, 1 algorithm)

This paper contains 46 sections, 9 equations, 13 figures, 8 tables, 1 algorithm.

Introduction
Predictive MEG Model
Data Preparation and Preprocessing
Predicting MEG Responses from LM Embeddings
Best Language Model Layer for Predicting MEG Responses
Spatio-temporal Pattern of Predictions
Identifying Phenomena of Interest
Automatically Discovering Differences between Brain Responses and LM Predictions
Proposed Hypotheses
Manual Hypothesis Verification
Selected Phenomena
Improving Brain Alignment via Fine-tuning
Datasets
Fine-tuning Setup
Comparing Fine-tuned Models with the Base Model
...and 31 more sections

Figures (13)

Figure 1: Schematic of our experimental approach. The LM takes as input the current word along with its preceding context to produce the current word's LM embedding. This embedding is then used as input to a ridge regression model to predict the human brain responses associated with the word. The Mean Squared Error (MSE) between the predicted and actual MEG responses is calculated. Finally, an LLM-based hypothesis proposer is employed to formulate natural language hypotheses explaining the divergence between the predicted and actual MEG responses.
Figure 2: Pearson correlation of actual MEG responses with predicted responses using embeddings from layer 7 of GPT-2 XL on the Harry Potter dataset. The displayed layout is a flattened representation of the helmet-shaped sensor array. Deeper reds indicate more accurate LM predictions. Language regions are well predicted in language processing time windows (refer to $\S$\ref{['sec:spatio_temporal']} for more details).
Figure 3: Pearson correlation between actual MEG responses and predicted responses from (A) GPT-2 XL and (B) Llama-2 across LM layers and time after word onset on the Harry Potter dataset. Both models exhibit high correlations in early and intermediate layers at around 200ms. Correlation is computed across words and averaged across MEG channels.
Figure 4: Distribution of human responses for (A) the top 10 and (B) the bottom 10 hypotheses, ranked by the percentage of 'Divergent Sentence' responses.
Figure 5: Performance comparison of the base model with models fine-tuned on (A) social and (B) physical datasets. Each panel's y-axis shows the percentage of channels in the fine-tuned model with better, worse, or non-significantly different performance (measured by Pearson correlation) compared to the base model. Fine-tuned models outperform the base model during language processing time windows. Refer to \ref{['appendix:compare_ft']} for a detailed view of each MEG channel plotted.
...and 8 more figures

Divergences between Language Models and Human Brains

TL;DR

Abstract

Divergences between Language Models and Human Brains

Authors

TL;DR

Abstract

Table of Contents

Figures (13)