Table of Contents
Fetching ...

Decomposition of surprisal: Unified computational model of ERP components in language processing

Jiaxuan Li, Richard Futrell

TL;DR

An information-theoretic model of human language processing in the brain in which incoming linguistic input is processed at first shallowly and later with more depth, with these two kinds of information processing corresponding to distinct electroencephalographic signatures is advanced.

Abstract

The functional interpretation of language-related ERP components has been a central debate in psycholinguistics for decades. We advance an information-theoretic model of human language processing in the brain in which incoming linguistic input is processed at first shallowly and later with more depth, with these two kinds of information processing corresponding to distinct electroencephalographic signatures. Formally, we show that the information content (surprisal) of a word in context can be decomposed into two quantities: (A) shallow surprisal, which signals shallow processing difficulty for a word, and corresponds with the N400 signal; and (B) deep surprisal, which reflects the discrepancy between shallow and deep representations, and corresponds to the P600 signal and other late positivities. Both of these quantities can be estimated straightforwardly using modern NLP models. We validate our theory by successfully simulating ERP patterns elicited by a variety of linguistic manipulations in previously-reported experimental data from six experiments, with successful novel qualitative and quantitative predictions. Our theory is compatible with traditional cognitive theories assuming a `good-enough' shallow representation stage, but with a precise information-theoretic formulation. The model provides an information-theoretic model of ERP components grounded on cognitive processes, and brings us closer to a fully-specified neuro-computational model of language processing.

Decomposition of surprisal: Unified computational model of ERP components in language processing

TL;DR

An information-theoretic model of human language processing in the brain in which incoming linguistic input is processed at first shallowly and later with more depth, with these two kinds of information processing corresponding to distinct electroencephalographic signatures is advanced.

Abstract

The functional interpretation of language-related ERP components has been a central debate in psycholinguistics for decades. We advance an information-theoretic model of human language processing in the brain in which incoming linguistic input is processed at first shallowly and later with more depth, with these two kinds of information processing corresponding to distinct electroencephalographic signatures. Formally, we show that the information content (surprisal) of a word in context can be decomposed into two quantities: (A) shallow surprisal, which signals shallow processing difficulty for a word, and corresponds with the N400 signal; and (B) deep surprisal, which reflects the discrepancy between shallow and deep representations, and corresponds to the P600 signal and other late positivities. Both of these quantities can be estimated straightforwardly using modern NLP models. We validate our theory by successfully simulating ERP patterns elicited by a variety of linguistic manipulations in previously-reported experimental data from six experiments, with successful novel qualitative and quantitative predictions. Our theory is compatible with traditional cognitive theories assuming a `good-enough' shallow representation stage, but with a precise information-theoretic formulation. The model provides an information-theoretic model of ERP components grounded on cognitive processes, and brings us closer to a fully-specified neuro-computational model of language processing.
Paper Structure (33 sections, 25 equations, 6 figures, 9 tables)

This paper contains 33 sections, 25 equations, 6 figures, 9 tables.

Figures (6)

  • Figure 1: Overview of model architecture. The curve represents tradeoff between distortion and processing depth (KL divergence) in optimal representation policies for the given input. Each location in the white part of the plane represents a possible representation policy for the input; the tradeoffs in the gray region are unachievable. The black line shows the efficient frontier of policies that achieve the minimal distortion for a given level of processing depth. As a comprehender perceives input $x$, the representation policies move down this frontier, increasing depth and decreasing distortion. The total processing depth is equal to surprisal, and can be partitioned into two parts corresponding to N400 and P600 ERP signals. The figure shows the actual curve and representation policies for the given input, using GPT-2 for the initial representation $p_0$ and a distortion metric based on edit distance.
  • Figure 2: N400 and P600 amplitudes from model simulation in AD-98 (a), Kim-05 (b), Ito-16 (c) and Federmeier-07 (d). Stars indicate a significant N400 or P600 in the original underlying studies as reported by the authors.
  • Figure 3: N400 and P600 amplitudes from model simulation and from human ERP experiments in Chow-16R (a-b), Chow-16S (c-d) and Ryskin-21 (e-f).
  • Figure 4: P600 amplitudes simulated with Error Propagation model using GPT-2.
  • Figure 5: N400 and P600 amplitudes simulated with the counterpart sentences in the other conditions as candidate representations.
  • ...and 1 more figures