Table of Contents
Fetching ...

N-gram-like Language Models Predict Reading Time Best

James A. Michaelov, Roger P. Levy

TL;DR

It is demonstrated that the neural language models whose predictions are most correlated with n-gram probability are also those that calculate probabilities that are the most correlated with eye-tracking-based metrics of reading time on naturalistic text.

Abstract

Recent work has found that contemporary language models such as transformers can become so good at next-word prediction that the probabilities they calculate become worse for predicting reading time. In this paper, we propose that this can be explained by reading time being sensitive to simple n-gram statistics rather than the more complex statistics learned by state-of-the-art transformer language models. We demonstrate that the neural language models whose predictions are most correlated with n-gram probability are also those that calculate probabilities that are the most correlated with eye-tracking-based metrics of reading time on naturalistic text.

N-gram-like Language Models Predict Reading Time Best

TL;DR

It is demonstrated that the neural language models whose predictions are most correlated with n-gram probability are also those that calculate probabilities that are the most correlated with eye-tracking-based metrics of reading time on naturalistic text.

Abstract

Recent work has found that contemporary language models such as transformers can become so good at next-word prediction that the probabilities they calculate become worse for predicting reading time. In this paper, we propose that this can be explained by reading time being sensitive to simple n-gram statistics rather than the more complex statistics learned by state-of-the-art transformer language models. We demonstrate that the neural language models whose predictions are most correlated with n-gram probability are also those that calculate probabilities that are the most correlated with eye-tracking-based metrics of reading time on naturalistic text.
Paper Structure (21 sections, 3 figures, 1 table)

This paper contains 21 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: The correlation between $n$-gram surprisal and reading time in the Provo Corpus.
  • Figure 2: (A) The relationship between language model surprisal over the course of training and both $n$-gram surprisal and reading time in the Provo Corpus. (B) The relationship between the two sets of correlations.
  • Figure 3: The relationship between the correlation between language model surprisal and $n$-gram surprisal and the correlation between language model surprisal and each measure of reading time in the (A) Provo and (B) GECO datasets.