Instruction-tuning Aligns LLMs to the Human Brain

Khai Loong Aw; Syrielle Montariol; Badr AlKhamissi; Martin Schrimpf; Antoine Bosselut

Instruction-tuning Aligns LLMs to the Human Brain

Khai Loong Aw, Syrielle Montariol, Badr AlKhamissi, Martin Schrimpf, Antoine Bosselut

TL;DR

This study investigates whether instruction-tuning aligns large language models (LLMs) with human language processing in the brain and in behavior. Using 25 LLMs (vanilla and instruction-tuned) and three neural datasets of human reading, the authors measure brain alignment with Brain-Score’s linear predictivity and behavioral alignment via correlations between LLM perplexity and human reading times. They find that instruction-tuning yields a robust improvement in brain alignment (average $+6.2\%$) and that brain alignment strongly tracks world knowledge and model size ($r\approx 0.81$ to $0.95$). In contrast, behavioral alignment with human reading times shows little to no improvement and does not correlate reliably with world knowledge or size, suggesting distinct factors drive neural vs. behavioral alignment. The results imply that world-knowledge representations encoded by LLMs contribute to brain-like language processing, informing both NLP model design and neuroscience research on LLM–human alignment.

Abstract

Instruction-tuning is a widely adopted finetuning method that enables large language models (LLMs) to generate output that more closely resembles human responses. However, no studies have shown that instruction-tuning actually teaches LLMs to process language in a similar manner as humans. We investigate the effect of instruction-tuning on aligning LLM and human language processing mechanisms in two ways: (1) brain alignment, the similarity of LLM internal representations to neural activity in the human language system, and (2) behavioral alignment, the similarity of LLM and human behavior on a reading task. We assess 25 vanilla and instruction-tuned LLMs on three datasets involving humans reading naturalistic stories and sentences, and find that instruction-tuning generally enhances brain alignment (~6%), but has no similar effect on behavioral alignment. To identify factors underlying this improvement in brain alignment, we compute correlations between brain alignment and various LLM properties, such as model size, problem-solving, and world knowledge understanding. Notably, we find a strong positive correlation between brain alignment and model size (r = 0.95), as well as performance on tasks requiring world knowledge (r = 0.81). Our results demonstrate that instruction-tuning LLMs improves both world knowledge representations and brain alignment, suggesting that the mechanisms that encode world knowledge in LLMs also improve representational alignment to the human brain.

Instruction-tuning Aligns LLMs to the Human Brain

TL;DR

) and that brain alignment strongly tracks world knowledge and model size (

). In contrast, behavioral alignment with human reading times shows little to no improvement and does not correlate reliably with world knowledge or size, suggesting distinct factors drive neural vs. behavioral alignment. The results imply that world-knowledge representations encoded by LLMs contribute to brain-like language processing, informing both NLP model design and neuroscience research on LLM–human alignment.

Abstract

Paper Structure (33 sections, 9 figures, 10 tables)

This paper contains 33 sections, 9 figures, 10 tables.

Introduction
Background & Related Work
Language Models
Brain Alignment
Instruction-tuning aligns LLM representations to human brain activity
Factors underlying LLM-brain alignment
Behavioral Alignment
Instruction-tuning generally does not improve behavioral alignment
Factors underlying behavioral alignment
Discussion
Implications for NLP: Building LLMs
Implications for Neuroscience: Studying LLM-Human Alignment
Limitations and Future Work
Conclusion
Reproducibility Statement
...and 18 more sections

Figures (9)

Figure 1: Instruction-tuning aligns LLM representations to human brain activity.(A) The same language stimuli are presented to LLMs and human participants. Next, we fit a linear regression model from LLM layer activations to fMRI responses in the human language system. We apply this linear model to predict held-out fMRI responses from the original corpus of recordings, and compute brain alignment as the Pearson correlation between the predicted and actual fMRI responses. We evaluate 25 vanilla and instruction-tuned LLMs with sizes between 77M and 33B parameters. We compute the average across 3 neural datasets of humans reading naturalistic stories and sentences: Pereira2018, Blank2014, and Wehbe2014. (B) Instruction-tuning improves average brain alignment by 6.2% on average. Each point above the identity line is an instruction-tuned LLM with greater brain alignment than its vanilla version. Error bars, here and elsewhere, represent median absolute deviation over human participants. (C) Instruction-tuning improves average brain alignment on all three datasets. (D) We instruction-tune LLaMA-7B on the Alpaca dataset ("Instruction" model) and train an ablation model with the same data, but without the instruction in each training sample ("No Instruction" model). Our results show that brain alignment improvements are due to both (1) training data (present in both models) and (2) training LLMs to understand and follow instructions (present only in the first model).
Figure 2: World knowledge and model size are important factors underlying LLM-brain alignment. To identify factors underlying brain alignment, we test Pearson correlations between brain alignment and various LLM properties, such as model size, world knowledge in various domains (MMLU benchmark), and various types of problem-solving abilities (BBH benchmark). In the figure, insets display results on individual datasets, with stars reflecting statistical significance (n.s. = p $>$ 0.05, * = p $<$ 0.05, ** = p $<$ 0.005, etc.) (A) Brain alignment is significantly and strongly correlated with world knowledge as evaluated by the MMLU Overall score (r = 0.81), which reports the mean performance across all MMLU subjects. (B) Brain alignment is significantly and strongly correlated with performance on the world knowledge task category in BBH (r = 0.68). (C) Brain alignment is significantly and strongly correlated with model size (logarithm of number of parameters) (r = 0.95). In Appendix \ref{['appendix_results_corrs_BA']}, we provide a larger version of this figure with labels for each data point.
Figure 3: Instruction-tuning LLMs generally does not improve behavioral alignment to human reading times. Furthermore, behavioral alignment correlates poorly with all other tested measures: world knowledge, model size, and next-word prediction (NWP) ability. To compute behavioral alignment, we use the Futrell2018 benchmark in Brain-Score. The same language stimuli (naturalistic stories) are presented to LLMs and human participants. We then compute the Pearson correlation between per-word LLM perplexity and human reading times as the behavioral alignment. (A) Instruction-tuning does not generally improve behavioral alignment. Furthermore, behavioral alignment is poorly and not significantly correlated with all other measures: (B) world knowledge (p = 0.76), (C) model size (p = 0.31), (D) NWP loss for T5 models (p = 0.54), and (E) NWP loss for LLaMA models (p = 0.21). In Appendix \ref{['appendix_results_corrs_BehavA']}, we provide a larger, labeled version of this figure.
Figure 4: Method for computing Brain alignment, the similarity of an LLM’s internal representations to human brain activity.
Figure 5: Method for computing Behavioral alignment. The same language stimuli are presented to LLMs and human participants, using the Futrell2018 benchmark in Brain-Score, which contains naturalistic stories. We compute the behavioral alignment as the Pearson correlation between LLM perplexity for each word and human reading times for each word.
...and 4 more figures

Instruction-tuning Aligns LLMs to the Human Brain

TL;DR

Abstract

Instruction-tuning Aligns LLMs to the Human Brain

Authors

TL;DR

Abstract

Table of Contents

Figures (9)