Table of Contents
Fetching ...

Do Large Language Models Mirror Cognitive Language Processing?

Yuqi Ren, Renren Jin, Tongxuan Zhang, Deyi Xiong

TL;DR

This paper addresses whether large language models mirror human cognitive language processing by evaluating cross-modal representations with fMRI data using Representational Similarity Analysis (RSA). It introduces a framework to compute Representational Dissimilarity Matrices (RDMs) for both brain signals and LLM embeddings and compares them with $Sim$ using multiple metrics, while exploring how pre-training data size, model scaling, alignment training, and prompt strategies affect alignment. Key findings show that larger pre-training data, greater model scale, alignment training, and explicit prompts increase LLM-brain similarity, with a notable positive bias for positive sentiment; brain-similarity also correlates with several standard LLM evaluations. The results imply that brain-aligned representations can serve as a proxy for LLM capabilities and offer neuroscience-informed insights into how large models process language.

Abstract

Large Language Models (LLMs) have demonstrated remarkable abilities in text comprehension and logical reasoning, indicating that the text representations learned by LLMs can facilitate their language processing capabilities. In neuroscience, brain cognitive processing signals are typically utilized to study human language processing. Therefore, it is natural to ask how well the text embeddings from LLMs align with the brain cognitive processing signals, and how training strategies affect the LLM-brain alignment? In this paper, we employ Representational Similarity Analysis (RSA) to measure the alignment between 23 mainstream LLMs and fMRI signals of the brain to evaluate how effectively LLMs simulate cognitive language processing. We empirically investigate the impact of various factors (e.g., pre-training data size, model scaling, alignment training, and prompts) on such LLM-brain alignment. Experimental results indicate that pre-training data size and model scaling are positively correlated with LLM-brain similarity, and alignment training can significantly improve LLM-brain similarity. Explicit prompts contribute to the consistency of LLMs with brain cognitive language processing, while nonsensical noisy prompts may attenuate such alignment. Additionally, the performance of a wide range of LLM evaluations (e.g., MMLU, Chatbot Arena) is highly correlated with the LLM-brain similarity.

Do Large Language Models Mirror Cognitive Language Processing?

TL;DR

This paper addresses whether large language models mirror human cognitive language processing by evaluating cross-modal representations with fMRI data using Representational Similarity Analysis (RSA). It introduces a framework to compute Representational Dissimilarity Matrices (RDMs) for both brain signals and LLM embeddings and compares them with using multiple metrics, while exploring how pre-training data size, model scaling, alignment training, and prompt strategies affect alignment. Key findings show that larger pre-training data, greater model scale, alignment training, and explicit prompts increase LLM-brain similarity, with a notable positive bias for positive sentiment; brain-similarity also correlates with several standard LLM evaluations. The results imply that brain-aligned representations can serve as a proxy for LLM capabilities and offer neuroscience-informed insights into how large models process language.

Abstract

Large Language Models (LLMs) have demonstrated remarkable abilities in text comprehension and logical reasoning, indicating that the text representations learned by LLMs can facilitate their language processing capabilities. In neuroscience, brain cognitive processing signals are typically utilized to study human language processing. Therefore, it is natural to ask how well the text embeddings from LLMs align with the brain cognitive processing signals, and how training strategies affect the LLM-brain alignment? In this paper, we employ Representational Similarity Analysis (RSA) to measure the alignment between 23 mainstream LLMs and fMRI signals of the brain to evaluate how effectively LLMs simulate cognitive language processing. We empirically investigate the impact of various factors (e.g., pre-training data size, model scaling, alignment training, and prompts) on such LLM-brain alignment. Experimental results indicate that pre-training data size and model scaling are positively correlated with LLM-brain similarity, and alignment training can significantly improve LLM-brain similarity. Explicit prompts contribute to the consistency of LLMs with brain cognitive language processing, while nonsensical noisy prompts may attenuate such alignment. Additionally, the performance of a wide range of LLM evaluations (e.g., MMLU, Chatbot Arena) is highly correlated with the LLM-brain similarity.
Paper Structure (30 sections, 2 equations, 4 figures, 4 tables)

This paper contains 30 sections, 2 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Diagram of the proposed LLM-brain similarity estimation framework. ‘H' denotes sentence representations from different modalities. ‘$\rho$' denotes the pearson correlation coefficient. ‘S' denotes similarity measurement method.
  • Figure 2: The LLM-brain similarity of 10 different checkpoints on Amber. ‘‘ckpt" is the abbreviation for ‘‘checkpoint".
  • Figure 4: The LLM-brain similarity calculated by Pearson correlation coefficient of LLMs across different sentimental polarities.
  • Figure 5: Correlation between the performance of LLMs on evaluations and the LLM-brain similarity calculated by Pearson correlation coefficient.