Instruction-tuning Aligns LLMs to the Human Brain
Khai Loong Aw, Syrielle Montariol, Badr AlKhamissi, Martin Schrimpf, Antoine Bosselut
TL;DR
This study investigates whether instruction-tuning aligns large language models (LLMs) with human language processing in the brain and in behavior. Using 25 LLMs (vanilla and instruction-tuned) and three neural datasets of human reading, the authors measure brain alignment with Brain-Score’s linear predictivity and behavioral alignment via correlations between LLM perplexity and human reading times. They find that instruction-tuning yields a robust improvement in brain alignment (average $+6.2\%$) and that brain alignment strongly tracks world knowledge and model size ($r\approx 0.81$ to $0.95$). In contrast, behavioral alignment with human reading times shows little to no improvement and does not correlate reliably with world knowledge or size, suggesting distinct factors drive neural vs. behavioral alignment. The results imply that world-knowledge representations encoded by LLMs contribute to brain-like language processing, informing both NLP model design and neuroscience research on LLM–human alignment.
Abstract
Instruction-tuning is a widely adopted finetuning method that enables large language models (LLMs) to generate output that more closely resembles human responses. However, no studies have shown that instruction-tuning actually teaches LLMs to process language in a similar manner as humans. We investigate the effect of instruction-tuning on aligning LLM and human language processing mechanisms in two ways: (1) brain alignment, the similarity of LLM internal representations to neural activity in the human language system, and (2) behavioral alignment, the similarity of LLM and human behavior on a reading task. We assess 25 vanilla and instruction-tuned LLMs on three datasets involving humans reading naturalistic stories and sentences, and find that instruction-tuning generally enhances brain alignment (~6%), but has no similar effect on behavioral alignment. To identify factors underlying this improvement in brain alignment, we compute correlations between brain alignment and various LLM properties, such as model size, problem-solving, and world knowledge understanding. Notably, we find a strong positive correlation between brain alignment and model size (r = 0.95), as well as performance on tasks requiring world knowledge (r = 0.81). Our results demonstrate that instruction-tuning LLMs improves both world knowledge representations and brain alignment, suggesting that the mechanisms that encode world knowledge in LLMs also improve representational alignment to the human brain.
