Divergences between Language Models and Human Brains
Yuchen Zhou, Emmy Liu, Graham Neubig, Michael J. Tarr, Leila Wehbe
TL;DR
The paper investigates how language-model representations diverge from human brain responses during language processing, using MEG data from reading and listening to narratives. It introduces a data-driven pipeline that encodes MEG signals with LM embeddings via ridge regression, identifies divergences through an automatic hypothesis proposer, and validates two core phenomena: social/emotional intelligence and physical commonsense. Behavioral experiments corroborate these findings, and domain-specific fine-tuning on Social IQa and PiQA improves brain alignment within language-processing time windows. The work highlights concrete gaps in LM representations and shows that targeted fine-tuning can bridge some of these gaps, though it remains limited by the scope of narratives examined and points to broader datasets for future exploration.
Abstract
Do machines and humans process language in similar ways? Recent research has hinted at the affirmative, showing that human neural activity can be effectively predicted using the internal representations of language models (LMs). Although such results are thought to reflect shared computational principles between LMs and human brains, there are also clear differences in how LMs and humans represent and use language. In this work, we systematically explore the divergences between human and machine language processing by examining the differences between LM representations and human brain responses to language as measured by Magnetoencephalography (MEG) across two datasets in which subjects read and listened to narrative stories. Using an LLM-based data-driven approach, we identify two domains that LMs do not capture well: social/emotional intelligence and physical commonsense. We validate these findings with human behavioral experiments and hypothesize that the gap is due to insufficient representations of social/emotional and physical knowledge in LMs. Our results show that fine-tuning LMs on these domains can improve their alignment with human brain responses.
