Table of Contents
Fetching ...

Back to the Future: The Role of Past and Future Context Predictability in Incremental Language Production

Shiva Upadhye, Richard Futrell

TL;DR

The paper advances an information-theoretic view of how past and future context shape incremental language production by introducing conditional PMI as a principled measure that conditions on the past when assessing future informativity. Using a single infill-trained GPT-2 small model on naturalistic dialogue, the authors estimate forward, backward, bidirectional, and PMI-based predictability to reassess word durations and model substitution errors in naturalistic speech. Across two studies, they find that future-context informativity reliably reduces word duration and influences substitution choices, with conditional PMI providing a robust account that integrates past and planned future information. These findings connect contextual predictability to sentence planning mechanisms and support a resource-rational view of production under constraints. The work also offers a methodological advance by deriving all predictability measures from a single, context-aware language model and makes data and code publicly available for replication.

Abstract

Contextual predictability shapes both the form and choice of words in online language production. The effects of the predictability of a word given its previous context are generally well-understood in both production and comprehension, but studies of naturalistic production have also revealed a poorly-understood backward predictability effect of a word given its future context, which may be related to future planning. Here, in two studies of naturalistic speech corpora, we investigate backward predictability effects using improved measures and more powerful language models, introducing a new principled and conceptually motivated information-theoretic predictability measure that integrates predictability from both the future and the past context. Our first study revisits classic predictability effects on word duration. Our second study investigates substitution errors within a generative framework that independently models the effects of lexical, contextual, and communicative factors on word choice, while predicting the actual words that surface as speech errors. We find that our proposed conceptually-motivated alternative to backward predictability yields qualitatively similar effects across both studies. Through a fine-grained analysis of substitution errors, we further show that different kinds of errors are suggestive of how speakers prioritize form, meaning, and context-based information during lexical planning. Together, these findings illuminate the functional roles of past and future context in how speakers encode and choose words, offering a bridge between contextual predictability effects and the mechanisms of sentence planning.

Back to the Future: The Role of Past and Future Context Predictability in Incremental Language Production

TL;DR

The paper advances an information-theoretic view of how past and future context shape incremental language production by introducing conditional PMI as a principled measure that conditions on the past when assessing future informativity. Using a single infill-trained GPT-2 small model on naturalistic dialogue, the authors estimate forward, backward, bidirectional, and PMI-based predictability to reassess word durations and model substitution errors in naturalistic speech. Across two studies, they find that future-context informativity reliably reduces word duration and influences substitution choices, with conditional PMI providing a robust account that integrates past and planned future information. These findings connect contextual predictability to sentence planning mechanisms and support a resource-rational view of production under constraints. The work also offers a methodological advance by deriving all predictability measures from a single, context-aware language model and makes data and code publicly available for replication.

Abstract

Contextual predictability shapes both the form and choice of words in online language production. The effects of the predictability of a word given its previous context are generally well-understood in both production and comprehension, but studies of naturalistic production have also revealed a poorly-understood backward predictability effect of a word given its future context, which may be related to future planning. Here, in two studies of naturalistic speech corpora, we investigate backward predictability effects using improved measures and more powerful language models, introducing a new principled and conceptually motivated information-theoretic predictability measure that integrates predictability from both the future and the past context. Our first study revisits classic predictability effects on word duration. Our second study investigates substitution errors within a generative framework that independently models the effects of lexical, contextual, and communicative factors on word choice, while predicting the actual words that surface as speech errors. We find that our proposed conceptually-motivated alternative to backward predictability yields qualitatively similar effects across both studies. Through a fine-grained analysis of substitution errors, we further show that different kinds of errors are suggestive of how speakers prioritize form, meaning, and context-based information during lexical planning. Together, these findings illuminate the functional roles of past and future context in how speakers encode and choose words, offering a bridge between contextual predictability effects and the mechanisms of sentence planning.

Paper Structure

This paper contains 39 sections, 12 equations, 13 figures, 6 tables, 2 algorithms.

Figures (13)

  • Figure 1: An illustration of forward and backward-looking contextual predictability effects in naturalistic production.
  • Figure 2: Effects of communicative intent and context-based information sources on lexical planning in incremental language production.
  • Figure 3: An illustration of the information-processing dependencies between the speaker's message ($M$), past context ($C_{<t}$), future context ($C_{>t}$), and the current word ($w_t$). Here, we treat $M$ as a latent variable, and $C_{>t}$ as an observed variable, even though for the speaker, the future is not be realized until after the production of $w_t$. Solid lines indicate explicit conditioning dependencies, dashed lines indicate causal influences between the latent variable $M$ and the contextual representations. Black squares indicate factors, which define functions between connected variables. For example, the black square in (c) denotes that $w_t$ is a function of both $C_{<t}$ and $C_{>t}$, whereas in (b), $w_t$ is determined by independent functions of $C_{<t}$ and $C_{>t}$. Left (a): Current word depends solely on past context, as reflected in forward predictability. Middle (b): the current word is influenced separately by past and future context---the assumption implicit in backward predictability. Right (c): Current word is jointly influenced by the past and future context, as in the case of bidirectional word probability.
  • Figure 4: Overview of the process for estimating contextual predictability variables from a custom-trained language model (LM). (a) Data augmentation process for enabling estimation of all three probabilities from an autoregressive LM. $U$ is the original corpus of utterances while $U'$ is the augmented corpus. Each utterance ($u$) in $U$ was transformed by uniformly sampling a position in the utterance, selecting the word in that position, and appending this word to the end of the utterance. The past and future context with respect to the original word position are demarcated using $\texttt{<PRE>}$ and $\texttt{<SUF>}$ tokens, and the transposed word is preceded by a $\texttt{<MID>}$ token. In $50\%$ of the utterances, the positions of the $\texttt{<SUF>}$ and $\texttt{<PRE>}$ contexts were swapped. This was done in accordance with prior work, which has found that changing the order of the preceding and following sequences improves estimation of infill probabilities bavarian2022efficient. See Appendix \ref{['app: model-training-and-eval']} for an algorithmic implementation of this process. (b)$U'$ serves as the training input to a randomly initiated GPT-2 language model parameterized by $\theta$. (c) An illustration of the inference process for estimating forward, backward, and bidirectional probabilities from the trained GPT-2 model $\overrightarrow{p}_{\theta}$.
  • Figure 5: (a) Estimated effect sizes for all probabilistic predictors from models with relative backward predictability and conditional PMI as formulations of future context predictability. Relative backward predictability assumes independence between past and future, whereas conditional PMI assumes conditional dependence. Error bars denote standard error. $p < 0.001$ (***), $p < 0.01$ (**), $p < 0.05$ (*), $p > 0.05$ (ns); (b) Delta log-likelihood values obtained from adding future context predictability measures to the baseline model incrementally. Higher values of $\Delta$Log-Likelihood indicate a better fit to the data.
  • ...and 8 more figures