An information-theoretic model of shallow and deep language comprehension

Jiaxuan Li; Richard Futrell

An information-theoretic model of shallow and deep language comprehension

Jiaxuan Li, Richard Futrell

TL;DR

The paper addresses how online language comprehension can be shallow yet accurate under resource constraints by formalizing a rate-distortion-type trade-off between processing depth and distortion. It introduces an information-theoretic model where the interpretation policy $p_t(w|x)$ minimizes expected distortion $\mathbb{E}[d(w,x)]$ subject to a time-varying depth constraint, yielding $p_t(w|x)\propto p_0(w) e^{-{\lambda_0 t d(w,x)}}$ and $D(t)\! =\! \mathbb{E}_{p(x)}[D_{KL}(p_t(w|x)\|p_0(w))]$. The authors connect processing depth and effort to measurable signals, proposing that EEG and reading times reflect $D'(t)$ and demonstrating that the model can simulate N400 and P600 ERP patterns as well as garden-path reading-time effects. Empirical validations on EEG datasets and garden-path reading times show the framework accounts for both behavioral and neural data, offering a resource-rational account of shallow-to-deep processing and a unified view of language comprehension dynamics.

Abstract

A large body of work in psycholinguistics has focused on the idea that online language comprehension can be shallow or `good enough': given constraints on time or available computation, comprehenders may form interpretations of their input that are plausible but inaccurate. However, this idea has not yet been linked with formal theories of computation under resource constraints. Here we use information theory to formulate a model of language comprehension as an optimal trade-off between accuracy and processing depth, formalized as bits of information extracted from the input, which increases with processing time. The model provides a measure of processing effort as the change in processing depth, which we link to EEG signals and reading times. We validate our theory against a large-scale dataset of garden path sentence reading times, and EEG experiments featuring N400, P600 and biphasic ERP effects. By quantifying the timecourse of language processing as it proceeds from shallow to deep, our model provides a unified framework to explain behavioral and neural signatures of language comprehension.

An information-theoretic model of shallow and deep language comprehension

TL;DR

minimizes expected distortion

subject to a time-varying depth constraint, yielding

and

. The authors connect processing depth and effort to measurable signals, proposing that EEG and reading times reflect

and demonstrating that the model can simulate N400 and P600 ERP patterns as well as garden-path reading-time effects. Empirical validations on EEG datasets and garden-path reading times show the framework accounts for both behavioral and neural data, offering a resource-rational account of shallow-to-deep processing and a unified view of language comprehension dynamics.

Abstract

Paper Structure (17 sections, 6 equations, 7 figures, 2 tables)

This paper contains 17 sections, 6 equations, 7 figures, 2 tables.

Introduction
Model
Formal model
Application to language comprehension
Processing effort
Link to EEG measures
Link to reading times
Relation to noisy-channel models
Study 1: N400, P600, and biphasic EEG signals
Dataset
Implementation
Results
Study 2: Garden path reading times
Dataset
Implementation
...and 2 more sections

Figures (7)

Figure 1: Tradeoff between distortion and processing depth (KL divergence) in optimal interpretation policies for the given input. Each location in the white part of the plane represents a possible interpretation policy for the input; the tradeoffs in the gray region are unachievable. The black line shows the efficient frontier of policies that achieve the minimal distortion for a given level of processing depth. We hold that interpretation policies move down this frontier with increasing processing time.
Figure 2: Top. Probabilities of four interpretations $w$ given input $x=\text{"story"}$ in the given context as a function of processing time, as predicted from Eq. with parameters described in the text. Bottom. Measures of processing effort over time for the same input.
Figure 3: Processing timecourses for four different inputs in the given context. "Anecdote" is the control and the most likely completion. "Hearse" represents a semantic anomaly, "anecdotes" represents a syntactic anomaly, and "antidotes" represents a recoverable anomaly---semantically anomalous input that can be easily mistaken for a more likely input.
Figure 4: Simulated EEG signal corresponding to the processing timecourses in Figure , using Eq. with $\omega=1$ and $\phi=-2/3$.
Figure 5: Simulation results for EEG experiments. Left. Instantaneous processing effort ($D'(t)$) Middle. Simulated N400 effect size. Right. Simulated P600 effect size.
...and 2 more figures

An information-theoretic model of shallow and deep language comprehension

TL;DR

Abstract

An information-theoretic model of shallow and deep language comprehension

Authors

TL;DR

Abstract

Table of Contents

Figures (7)