Table of Contents
Fetching ...

Language models emulate certain cognitive profiles: An investigation of how predictability measures interact with individual differences

Patrick Haller, Lena S. Bolliger, Lena A. Jäger

TL;DR

The study tackles the limitation of group-level analyses by examining how $\Delta_\mathrm{LL}$-based predictive power of surprisal and contextual entropy from multiple German LMs varies with individual cognitive capacities measured by 13 psychometric tests. It uses five auto-regressive LMs (GPT-2 base/large, Llama 2 7B/13B, Mixtral) to generate word- and subword-level surprisal and entropy, and fits linear-mixed models with interactions between predictability and psychometric scores, validated by 10-fold cross-validation on InDiCo reading-time data. The main findings show that incorporating cognitive capacities improves prediction, high-cognition readers show weaker predictability effects, and LMs tend to emulate readers with lower verbal intelligence, implying model biases in processing. These insights advance understanding of how cognition and LM-derived predictability jointly shape reading, with implications for tailoring NLP tools and interpreting model-based psycholinguistic results.

Abstract

To date, most investigations on surprisal and entropy effects in reading have been conducted on the group level, disregarding individual differences. In this work, we revisit the predictive power of surprisal and entropy measures estimated from a range of language models (LMs) on data of human reading times as a measure of processing effort by incorporating information of language users' cognitive capacities. To do so, we assess the predictive power of surprisal and entropy estimated from generative LMs on reading data obtained from individuals who also completed a wide range of psychometric tests. Specifically, we investigate if modulating surprisal and entropy relative to cognitive scores increases prediction accuracy of reading times, and we examine whether LMs exhibit systematic biases in the prediction of reading times for cognitively high- or low-performing groups, revealing what type of psycholinguistic subject a given LM emulates. Our study finds that in most cases, incorporating cognitive capacities increases predictive power of surprisal and entropy on reading times, and that generally, high performance in the psychometric tests is associated with lower sensitivity to predictability effects. Finally, our results suggest that the analyzed LMs emulate readers with lower verbal intelligence, suggesting that for a given target group (i.e., individuals with high verbal intelligence), these LMs provide less accurate predictability estimates.

Language models emulate certain cognitive profiles: An investigation of how predictability measures interact with individual differences

TL;DR

The study tackles the limitation of group-level analyses by examining how -based predictive power of surprisal and contextual entropy from multiple German LMs varies with individual cognitive capacities measured by 13 psychometric tests. It uses five auto-regressive LMs (GPT-2 base/large, Llama 2 7B/13B, Mixtral) to generate word- and subword-level surprisal and entropy, and fits linear-mixed models with interactions between predictability and psychometric scores, validated by 10-fold cross-validation on InDiCo reading-time data. The main findings show that incorporating cognitive capacities improves prediction, high-cognition readers show weaker predictability effects, and LMs tend to emulate readers with lower verbal intelligence, implying model biases in processing. These insights advance understanding of how cognition and LM-derived predictability jointly shape reading, with implications for tailoring NLP tools and interpreting model-based psycholinguistic results.

Abstract

To date, most investigations on surprisal and entropy effects in reading have been conducted on the group level, disregarding individual differences. In this work, we revisit the predictive power of surprisal and entropy measures estimated from a range of language models (LMs) on data of human reading times as a measure of processing effort by incorporating information of language users' cognitive capacities. To do so, we assess the predictive power of surprisal and entropy estimated from generative LMs on reading data obtained from individuals who also completed a wide range of psychometric tests. Specifically, we investigate if modulating surprisal and entropy relative to cognitive scores increases prediction accuracy of reading times, and we examine whether LMs exhibit systematic biases in the prediction of reading times for cognitively high- or low-performing groups, revealing what type of psycholinguistic subject a given LM emulates. Our study finds that in most cases, incorporating cognitive capacities increases predictive power of surprisal and entropy on reading times, and that generally, high performance in the psychometric tests is associated with lower sensitivity to predictability effects. Finally, our results suggest that the analyzed LMs emulate readers with lower verbal intelligence, suggesting that for a given target group (i.e., individuals with high verbal intelligence), these LMs provide less accurate predictability estimates.
Paper Structure (31 sections, 9 equations, 5 figures, 1 table)

This paper contains 31 sections, 9 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Predictive power of entropy and surprisal on reading times. Combined refers to the regression model where both predictors were included. Higher ${\Delta_\mathrm{LL}}$ indicates higher predictive power.
  • Figure 2: ${\Delta_\mathrm{LL}}$ (mean and 95% CI) for the interactions between psychometric scores and model surprisal or entropy as additional predictors for reading times. Empty dots indicate that the ${\Delta_\mathrm{LL}}$ is not significantly different from zero.
  • Figure 3: Difference in PP ($\mathrm{\Delta\mathrm{PP}}$) (mean and 95% CI) of surprisal and contextual entropy for reading times. Positive $\mathrm{\Delta\mathrm{PP}}$ indicate higher PP for high-performing individuals; negative $\mathrm{\Delta\mathrm{PP}}$ higher PP for low-performing individuals. Empty dots indicate that the $\mathrm{\Delta\mathrm{PP}}$ is not significantly different from zero.
  • Figure 4: Psychometric tests conducted with all participants.
  • Figure 5: Correlations between scores of all psychometric tests. Red cells indicate positive correlation coefficients, blue cells negative correlation coefficients. Significant coefficients are displayed, blank cells indicate that the correlation was not significant with $\alpha=.05$.