To Words and Beyond: Probing Large Language Models for Sentence-Level Psycholinguistic Norms of Memorability and Reading Times

Thomas Hikaru Clark; Carlos Arriaga; Javier Conde; Gonzalo Martínez; Pedro Reviriego

To Words and Beyond: Probing Large Language Models for Sentence-Level Psycholinguistic Norms of Memorability and Reading Times

Thomas Hikaru Clark, Carlos Arriaga, Javier Conde, Gonzalo Martínez, Pedro Reviriego

Abstract

Large Language Models (LLMs) have recently been shown to produce estimates of psycholinguistic norms, such as valence, arousal, or concreteness, for words and multiword expressions, that correlate with human judgments. These estimates are obtained by prompting an LLM, in zero-shot fashion, with a question similar to those used in human studies. Meanwhile, for other norms such as lexical decision time or age of acquisition, LLMs require supervised fine-tuning to obtain results that align with ground-truth values. In this paper, we extend this approach to the previously unstudied features of sentence memorability and reading times, which involve the relationship between multiple words in a sentence-level context. Our results show that via fine-tuning, models can provide estimates that correlate with human-derived norms and exceed the predictive power of interpretable baseline predictors, demonstrating that LLMs contain useful information about sentence-level features. At the same time, our results show very mixed zero-shot and few-shot performance, providing further evidence that care is needed when using LLM-prompting as a proxy for human cognitive measures.

To Words and Beyond: Probing Large Language Models for Sentence-Level Psycholinguistic Norms of Memorability and Reading Times

Abstract

Paper Structure (24 sections, 6 figures, 2 tables)

This paper contains 24 sections, 6 figures, 2 tables.

Introduction
Methods
Data
Word Memorability
Sentence Memorability
Self-Paced Reading Times
Eye-Tracking Reading Times
Model Evaluation
Zero-Shot
Few-Shot
Fine-Tuning
Correlation Analysis
Baselines
Results
Word Memorability
...and 9 more sections

Figures (6)

Figure 1: Overview of our approach, contrasting zero-shot prompting with fine-tuning on small supervised datasets for predicting psycholinguistic norms.
Figure 2: The correlation between model predictions and ground truth norms for word memorability, across 3 model families and in 3 evaluation regimes. For comparison, the mean correlation with the predictions of interpretable baseline predictors is included.
Figure 3: The correlation between model predictions and ground truth norms for sentence memorability, across 3 model families and in 3 evaluation regimes. For comparison, the mean correlation with the predictions of interpretable baseline predictors is included.
Figure 4: The correlation between model predictions and ground truth norms for self-paced reading times (Natural Stories corpus), across 3 model families and in 3 evaluation regimes. For comparison, the mean correlation with the predictions of interpretable baseline predictors is included.
Figure 5: The correlation between model predictions and ground truth norms for eye-tracking reading times (OneStop corpus), across 3 model families and in 3 evaluation regimes. For comparison, the mean correlation with the predictions of interpretable baseline predictors is included.
...and 1 more figures

To Words and Beyond: Probing Large Language Models for Sentence-Level Psycholinguistic Norms of Memorability and Reading Times

Abstract

To Words and Beyond: Probing Large Language Models for Sentence-Level Psycholinguistic Norms of Memorability and Reading Times

Authors

Abstract

Table of Contents

Figures (6)