Back to the Future: The Role of Past and Future Context Predictability in Incremental Language Production
Shiva Upadhye, Richard Futrell
TL;DR
The paper advances an information-theoretic view of how past and future context shape incremental language production by introducing conditional PMI as a principled measure that conditions on the past when assessing future informativity. Using a single infill-trained GPT-2 small model on naturalistic dialogue, the authors estimate forward, backward, bidirectional, and PMI-based predictability to reassess word durations and model substitution errors in naturalistic speech. Across two studies, they find that future-context informativity reliably reduces word duration and influences substitution choices, with conditional PMI providing a robust account that integrates past and planned future information. These findings connect contextual predictability to sentence planning mechanisms and support a resource-rational view of production under constraints. The work also offers a methodological advance by deriving all predictability measures from a single, context-aware language model and makes data and code publicly available for replication.
Abstract
Contextual predictability shapes both the form and choice of words in online language production. The effects of the predictability of a word given its previous context are generally well-understood in both production and comprehension, but studies of naturalistic production have also revealed a poorly-understood backward predictability effect of a word given its future context, which may be related to future planning. Here, in two studies of naturalistic speech corpora, we investigate backward predictability effects using improved measures and more powerful language models, introducing a new principled and conceptually motivated information-theoretic predictability measure that integrates predictability from both the future and the past context. Our first study revisits classic predictability effects on word duration. Our second study investigates substitution errors within a generative framework that independently models the effects of lexical, contextual, and communicative factors on word choice, while predicting the actual words that surface as speech errors. We find that our proposed conceptually-motivated alternative to backward predictability yields qualitatively similar effects across both studies. Through a fine-grained analysis of substitution errors, we further show that different kinds of errors are suggestive of how speakers prioritize form, meaning, and context-based information during lexical planning. Together, these findings illuminate the functional roles of past and future context in how speakers encode and choose words, offering a bridge between contextual predictability effects and the mechanisms of sentence planning.
