Revisiting Real-Time Digging-In Effects: No Evidence from NP/Z Garden-Paths

Amani Maina-Kilaas; Roger Levy

Revisiting Real-Time Digging-In Effects: No Evidence from NP/Z Garden-Paths

Amani Maina-Kilaas, Roger Levy

Abstract

Digging-in effects, where disambiguation difficulty increases with longer ambiguous regions, have been cited as evidence for self-organized sentence processing, in which structural commitments strengthen over time. In contrast, surprisal theory predicts no such effect unless lengthening genuinely shifts statistical expectations, and neural language models appear to show the opposite pattern. Whether digging-in is a robust real-time phenomenon in human sentence processing -- or an artifact of wrap-up processes or methodological confounds -- remains unclear. We report two experiments on English NP/Z garden-path sentences using Maze and self-paced reading, comparing human behavior with predictions from an ensemble of large language models. We find no evidence for real-time digging-in effects. Critically, items with sentence-final versus nonfinal disambiguation show qualitatively different patterns: positive digging-in trends appear only sentence-finally, where wrap-up effects confound interpretation. Nonfinal items -- the cleaner test of real-time processing -- show reverse trends consistent with neural model predictions.

Revisiting Real-Time Digging-In Effects: No Evidence from NP/Z Garden-Paths

Abstract

Paper Structure (22 sections, 5 figures, 1 table)

This paper contains 22 sections, 5 figures, 1 table.

Introduction
General Methods
Human Experiments
Surprisal Theory Predictions
Statistical Analysis
Experiment 1: Maze
Additional Details
Results
Empirical.
Predicted.
Discussion
Experiment 2: Self-Paced Reading
Additional Details
Results
Empirical.
...and 7 more sections

Figures (5)

Figure 1: Empirical response times for Experiment 1 (Maze), split by sentence-finality. Top panels show mean word RT by sentence region (omitting regions not in all conditions). Bottom panels show the mean critical word RT, with the right-most averaging RTs within a plausible spillover region. Error bars reflect 95% confidence intervals around by-item means.
Figure 2: Mean garden-path effect in Experiment 1 (Maze), split by sentence-finality. Top row shows empirical data, middle shows predicted data; bottom shows surprisal for reference. Error bars reflect 95% confidence intervals around by-item means, but readers should rely on the mixed-effects models for assessing significance due to better variance attribution.
Figure 3: Empirical response times for Experiment 2 (SPR), split by sentence-finality. Top panels show mean word RT by sentence region (omitting regions not in all conditions). Bottom panels show the mean critical word RT, with the right-most averaging RTs within a plausible spillover region. Error bars reflect 95% confidence intervals around by-item means.
Figure 4: Mean garden-path effect in Experiment 2 (SPR), split by sentence-finality. Top row shows empirical data, middle shows predicted data; bottom shows surprisal for reference. Error bars reflect 95% confidence intervals around by-item means, but readers should rely on the mixed-effects models for assessing significance due to better variance attribution.
Figure 5: Empirical vs. LLM-predicted response times in critical items. LLMs underpredict difficulty in disambiguating regions while accurately estimating in other sentence regions.

Revisiting Real-Time Digging-In Effects: No Evidence from NP/Z Garden-Paths

Abstract

Revisiting Real-Time Digging-In Effects: No Evidence from NP/Z Garden-Paths

Authors

Abstract

Table of Contents

Figures (5)