Table of Contents
Fetching ...

Information-Theoretic Storage Cost in Sentence Comprehension

Kohei Kajikawa, Shinnosuke Isono, Ethan Gotlieb Wilcox

Abstract

Real-time sentence comprehension imposes a significant load on working memory, as comprehenders must maintain contextual information to anticipate future input. While measures of such load have played an important role in psycholinguistic theories, they have been formalized, largely, using symbolic grammars, which assign discrete, uniform costs to syntactic predictions. This study proposes a measure of processing storage cost based on an information-theoretic formalization, as the amount of information previous words carry about future context, under uncertainty. Unlike previous discrete, grammar-based metrics, this measure is continuous, theory-neutral, and can be estimated from pre-trained neural language models. The validity of this approach is demonstrated through three analyses in English: our measure (i) recovers well-known processing asymmetries in center embeddings and relative clauses, (ii) correlates with a grammar-based storage cost in a syntactically-annotated corpus, and (iii) predicts reading-time variance in two large-scale naturalistic datasets over and above baseline models with traditional information-based predictors.

Information-Theoretic Storage Cost in Sentence Comprehension

Abstract

Real-time sentence comprehension imposes a significant load on working memory, as comprehenders must maintain contextual information to anticipate future input. While measures of such load have played an important role in psycholinguistic theories, they have been formalized, largely, using symbolic grammars, which assign discrete, uniform costs to syntactic predictions. This study proposes a measure of processing storage cost based on an information-theoretic formalization, as the amount of information previous words carry about future context, under uncertainty. Unlike previous discrete, grammar-based metrics, this measure is continuous, theory-neutral, and can be estimated from pre-trained neural language models. The validity of this approach is demonstrated through three analyses in English: our measure (i) recovers well-known processing asymmetries in center embeddings and relative clauses, (ii) correlates with a grammar-based storage cost in a syntactically-annotated corpus, and (iii) predicts reading-time variance in two large-scale naturalistic datasets over and above baseline models with traditional information-based predictors.
Paper Structure (29 sections, 14 equations, 7 figures)

This paper contains 29 sections, 14 equations, 7 figures.

Figures (7)

  • Figure 1: Illustration of the proposed storage cost measure. It quantifies the predictive potential shared between a target word $w_{i\xspace}$ and the future $\boldsymbol w_{[k\xspace: N\xspace]}\xspace$, conditioned on the remaining context. Storage cost is defined as the sum of these predictive potentials across all context words, representing the load of pending information.
  • Figure 2: Illustration of DLT storage cost as the predicted syntactic head hypothesis. In the center-embedded structure (top), the storage cost at Mary is five memory units, because five syntactic heads (the blue-colored words) are predicted to form a grammatical sentence at Mary. In contrast, the right-branching structure (bottom) imposes a minimal storage load. $t$ denotes a trace of extraction. This example is adopted from chen-etal-2005.
  • Figure 3: Mean information storage estimated by BERT at each word position. Error bars represent 95% confidence intervals across 30 items. In both cases, the more difficult structure (CE, ORC) exhibits higher storage cost, consistent with behavioral asymmetries.
  • Figure 4: Mean information-theoretic storage cost as a function of DLT storage cost in the UD_English-GUM corpus Zeldes2017. Points represent the mean value for each DLT bin with 95% confidence intervals. For this visualization, bins with fewer than 100 observations were excluded due to data sparsity. The red dashed line represents the linear regression fitted to the raw data corresponding to the displayed bins.
  • Figure 5: Results of the naturalistic reading-time analysis on Natural Stories (NS) and OneStop (OS). Abbreviations: SPR = self-paced reading; FPD = first-pass duration; GPD = go-past duration; TFD = total fixation duration.
  • ...and 2 more figures