Table of Contents
Fetching ...

The Effect of Surprisal on Reading Times in Information Seeking and Repeated Reading

Keren Gruteke Klein, Yoav Meiri, Omer Shubi, Yevgeni Berzak

Abstract

The effect of surprisal on processing difficulty has been a central topic of investigation in psycholinguistics. Here, we use eyetracking data to examine three language processing regimes that are common in daily life but have not been addressed with respect to this question: information seeking, repeated processing, and the combination of the two. Using standard regime-agnostic surprisal estimates we find that the prediction of surprisal theory regarding the presence of a linear effect of surprisal on processing times, extends to these regimes. However, when using surprisal estimates from regime-specific contexts that match the contexts and tasks given to humans, we find that in information seeking, such estimates do not improve the predictive power of processing times compared to standard surprisals. Further, regime-specific contexts yield near zero surprisal estimates with no predictive power for processing times in repeated reading. These findings point to misalignments of task and memory representations between humans and current language models, and question the extent to which such models can be used for estimating cognitively relevant quantities. We further discuss theoretical challenges posed by these results.

The Effect of Surprisal on Reading Times in Information Seeking and Repeated Reading

Abstract

The effect of surprisal on processing difficulty has been a central topic of investigation in psycholinguistics. Here, we use eyetracking data to examine three language processing regimes that are common in daily life but have not been addressed with respect to this question: information seeking, repeated processing, and the combination of the two. Using standard regime-agnostic surprisal estimates we find that the prediction of surprisal theory regarding the presence of a linear effect of surprisal on processing times, extends to these regimes. However, when using surprisal estimates from regime-specific contexts that match the contexts and tasks given to humans, we find that in information seeking, such estimates do not improve the predictive power of processing times compared to standard surprisals. Further, regime-specific contexts yield near zero surprisal estimates with no predictive power for processing times in repeated reading. These findings point to misalignments of task and memory representations between humans and current language models, and question the extent to which such models can be used for estimating cognitively relevant quantities. We further discuss theoretical challenges posed by these results.

Paper Structure

This paper contains 18 sections, 3 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: (a) GAM fits and (b) $\Delta LL$ for first pass Gaze Duration and Pythia-70m surprisals with standard context, using the linear and non-linear models. '***' $p < 0.001$, '**' $p < 0.01$. '*' $p < 0.05$, '(.)' $p \geq 0.05$. Key results: (a) Approximately linear curves for the non-linear models. (b) No statistically significant differences in the $\Delta LL$ of the linear and non-linear models, with the exception of information seeking in first reading. Smaller $\Delta LL$ in information seeking and repeated reading compared to first reading - ordinary reading for both models.
  • Figure 2: Comparison of GAM fits and $\Delta LL$ for first pass Gaze Duration with surprisal estimates of Pythia-70m from different context types. '***' $p < 0.001$, '**' $p < 0.01$. '*' $p < 0.05$, '(.)' $p \geq 0.05$.
  • Figure A3: Predictive power for reading times across different language models as a function of log-perplexity. Perplexity here is sentence-level perplexity averaged over all sentences in OneStopQA (the 30 articles used for the eye-tracking experiment).
  • Figure :
  • Figure A4: (a) GAM fits and (b) $\Delta LL$. Results for linear and non-linear models for different language models. '***' $p \leq 0.001$, '**' $p \leq 0.01$. '*' $p \leq 0.05$, '(.)' $p > 0.05$.
  • ...and 6 more figures