Table of Contents
Fetching ...

Towards a Similarity-adjusted Surprisal Theory

Clara Meister, Mario Giulianelli, Tiago Pimentel

TL;DR

Experimental results with reading time data indicate that similarity-adjusted surprisal adds predictive power beyond standard surprisal for certain datasets, suggesting it serves as a complementary measure of comprehension effort.

Abstract

Surprisal theory posits that the cognitive effort required to comprehend a word is determined by its contextual predictability, quantified as surprisal. Traditionally, surprisal theory treats words as distinct entities, overlooking any potential similarity between them. Giulianelli et al. (2023) address this limitation by introducing information value, a measure of predictability designed to account for similarities between communicative units. Our work leverages Ricotta and Szeidl's (2006) diversity index to extend surprisal into a metric that we term similarity-adjusted surprisal, exposing a mathematical relationship between surprisal and information value. Similarity-adjusted surprisal aligns with information value when considering graded similarities and reduces to standard surprisal when words are treated as distinct. Experimental results with reading time data indicate that similarity-adjusted surprisal adds predictive power beyond standard surprisal for certain datasets, suggesting it serves as a complementary measure of comprehension effort.

Towards a Similarity-adjusted Surprisal Theory

TL;DR

Experimental results with reading time data indicate that similarity-adjusted surprisal adds predictive power beyond standard surprisal for certain datasets, suggesting it serves as a complementary measure of comprehension effort.

Abstract

Surprisal theory posits that the cognitive effort required to comprehend a word is determined by its contextual predictability, quantified as surprisal. Traditionally, surprisal theory treats words as distinct entities, overlooking any potential similarity between them. Giulianelli et al. (2023) address this limitation by introducing information value, a measure of predictability designed to account for similarities between communicative units. Our work leverages Ricotta and Szeidl's (2006) diversity index to extend surprisal into a metric that we term similarity-adjusted surprisal, exposing a mathematical relationship between surprisal and information value. Similarity-adjusted surprisal aligns with information value when considering graded similarities and reduces to standard surprisal when words are treated as distinct. Experimental results with reading time data indicate that similarity-adjusted surprisal adds predictive power beyond standard surprisal for certain datasets, suggesting it serves as a complementary measure of comprehension effort.

Paper Structure

This paper contains 35 sections, 4 theorems, 22 equations, 1 figure, 4 tables.

Key Result

Theorem 1

Let ${\color{purple}d}_{\boldsymbol{w}_{<t}}\!\!:\!\mathcal{V}\times\!\mathcal{V}\!\to\![0, 1]$ and ${\color{ForestGreen}z}_{\boldsymbol{w}_{<t}} (w_t, w')\!=\!1 -{\color{purple}d}_{\boldsymbol{w}_{<t}} (w_t, w')$. Under these settings, next-word information value and similarity-adjusted surprisal h

Figures (1)

  • Figure 1: The change in reading time dataset log-likelihoods as a function of the temperature parameter used with the semantic-similarity function in similarity-adjusted surprisal computations. Each line corresponds to a different set of predictors added to the regressor. Shaded regions indicate 95% confidence intervals, as computed using standard bootstrapping techniques on our per-fold $\Delta_{\mathcal{L}}$ values.

Theorems & Definitions (9)

  • Definition 1
  • Theorem 1
  • proof
  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Theorem 3
  • proof