N-gram-like Language Models Predict Reading Time Best

James A. Michaelov; Roger P. Levy

N-gram-like Language Models Predict Reading Time Best

James A. Michaelov, Roger P. Levy

TL;DR

It is demonstrated that the neural language models whose predictions are most correlated with n-gram probability are also those that calculate probabilities that are the most correlated with eye-tracking-based metrics of reading time on naturalistic text.

Abstract

Recent work has found that contemporary language models such as transformers can become so good at next-word prediction that the probabilities they calculate become worse for predicting reading time. In this paper, we propose that this can be explained by reading time being sensitive to simple n-gram statistics rather than the more complex statistics learned by state-of-the-art transformer language models. We demonstrate that the neural language models whose predictions are most correlated with n-gram probability are also those that calculate probabilities that are the most correlated with eye-tracking-based metrics of reading time on naturalistic text.

N-gram-like Language Models Predict Reading Time Best

TL;DR

Abstract

Paper Structure (21 sections, 3 figures, 1 table)

This paper contains 21 sections, 3 figures, 1 table.

Introduction
Where does the inverse scaling come from?
Divergence between statistical and subjective probabilities
Training data features
Architectures and inductive biases
Metrics of reading time are sensitive to $n$-gram probabilities
Experiment 1
Method
$n$-grams
Reading Time
Results
Discussion
Experiment 2
Method
Results
...and 6 more sections

Figures (3)

Figure 1: The correlation between $n$-gram surprisal and reading time in the Provo Corpus.
Figure 2: (A) The relationship between language model surprisal over the course of training and both $n$-gram surprisal and reading time in the Provo Corpus. (B) The relationship between the two sets of correlations.
Figure 3: The relationship between the correlation between language model surprisal and $n$-gram surprisal and the correlation between language model surprisal and each measure of reading time in the (A) Provo and (B) GECO datasets.

N-gram-like Language Models Predict Reading Time Best

TL;DR

Abstract

N-gram-like Language Models Predict Reading Time Best

Authors

TL;DR

Abstract

Table of Contents

Figures (3)