Table of Contents
Fetching ...

Algorithmic Consequences of Particle Filters for Sentence Processing: Amplified Garden-Paths and Digging-In Effects

Amani Maina-Kilaas, Roger Levy

Abstract

Under surprisal theory, linguistic representations affect processing difficulty only through the bottleneck of surprisal. Our best estimates of surprisal come from large language models, which have no explicit representation of structural ambiguity. While LLM surprisal robustly predicts reading times across languages, it systematically underpredicts difficulty when structural expectations are violated -- suggesting that representations of ambiguity are causally implicated in sentence processing. Particle filter models offer an alternative where structural hypotheses are explicitly represented as a finite set of particles. We prove several algorithmic consequences of particle filter models, including the amplification of garden-path effects. Most critically, we demonstrate that resampling, a common practice with these models, inherently produces real-time digging-in effects -- where disambiguation difficulty increases with ambiguous region length. Digging-in magnitude scales inversely with particle count: fully parallel models predict no such effect.

Algorithmic Consequences of Particle Filters for Sentence Processing: Amplified Garden-Paths and Digging-In Effects

Abstract

Under surprisal theory, linguistic representations affect processing difficulty only through the bottleneck of surprisal. Our best estimates of surprisal come from large language models, which have no explicit representation of structural ambiguity. While LLM surprisal robustly predicts reading times across languages, it systematically underpredicts difficulty when structural expectations are violated -- suggesting that representations of ambiguity are causally implicated in sentence processing. Particle filter models offer an alternative where structural hypotheses are explicitly represented as a finite set of particles. We prove several algorithmic consequences of particle filter models, including the amplification of garden-path effects. Most critically, we demonstrate that resampling, a common practice with these models, inherently produces real-time digging-in effects -- where disambiguation difficulty increases with ambiguous region length. Digging-in magnitude scales inversely with particle count: fully parallel models predict no such effect.
Paper Structure (13 sections, 11 theorems, 54 equations, 4 figures)

This paper contains 13 sections, 11 theorems, 54 equations, 4 figures.

Key Result

Theorem 1

The expected surprisal of any word in context increases monotonically as a function of resampling steps. That is, Additionally, the above inequality is strict if the following statement holds:

Figures (4)

  • Figure 1: Expected surprisal over the course of resampling. We assume only two structures, $T_1$ and $T_2$, which give word $w$ the in-context probability $Q(w\mid T_1,C) = 0.004$ and $Q(w\mid T_2,C) = 0.5$, such that $w$ strongly disambiguates to $T_2$. Using 25 particles, we simulate the change in surprisal with an ambiguous context that prefers $T_1$, $\pi_{\text{AMB}}(T_1\mid C) = 0.8$, and unambiguous context that prefers $T_2$, $\pi_{\text{UNAMB}}(T_1\mid C) = 0.2$. Expected surprisal is lower-bounded by $S\left(\pi\right)$, reflecting full parallelism, and upper-bounded by $\mathop{\mathrm{\hbox{$\mathbb{E}$} }}\limits_{ \pi^{(N)}_{s_{\infty}}}\left[S\left( \pi^{(N)}_{s_{\infty}}\right)\right]$, at which point only a single structure is entertained by the parser. Shaded regions indicate ±1 stdev of the sample (50 thousand trials); 95% error bars would not exceed the line width.
  • Figure 2: Expected surprisal (top) and garden-path effects (bottom) for a hypothetical digging-in experiment with short and long ambiguous regions, varying disambiguation strength (through $Q_1 = Q(w \mid T_1, C)$) and the number of particles $N$. Long versions typically add 2-3 words and so we use 2 resampling steps. Fixed parameters: $Q(w\mid T_2,C) = 0.5$, $\pi_{\text{AMB}}(T_1\mid C) = 0.8$, $\pi_{\text{UNAMB}}(T_1\mid C) = 0.2$. Estimated with 50 thousand trials; 95% error bars would not be visible.
  • Figure 3: Expected surprisal in early resampling steps, varying disambiguation strength (through $Q_1 = Q(w \mid T_1, C)$) and the number of particles $N$. Fixed parameters: $Q(w\mid T_2,C) = 0.5$, $\pi_{\text{AMB}}(T_1\mid C) = 0.8$, $\pi_{\text{UNAMB}}(T_1\mid C) = 0.2$. The second-order approximation is computed at each empirical sample of $\pi^{(N)}_{s_{i}}$ and cumulatively summed from the empirical starting point $\mathop{\mathrm{\hbox{$\mathbb{E}$} }}\limits_{ \pi^{(N)}_{s_{0}}}\left[S\left( \pi^{(N)}_{s_{0}}\right)\right]$. The linear-diffusion approximation applies a constant slope estimated from the empirical sample of $\pi^{(N)}_{s_{0}}$. Shaded regions indicate ±1 stdev of the sample (1 million trials); 95% error bars would not exceed the line width.
  • Figure 4: True per-step surprisal increase by approximated value, using the data from \ref{['fig:grid']}. Top: visualized on linear scale. Bottom: visualized on log-log scale.

Theorems & Definitions (18)

  • Theorem 1: Expected Surprisal Under Resampling
  • Theorem 2: Cost of Repeated Resampling
  • Theorem 3: Second-Order Approximation of Surprisal Delta
  • Theorem 4: Linear-Diffusion Approximation of Surprisal Delta
  • Theorem 4: Expected Surprisal Under Resampling
  • proof
  • Lemma 1: Maximum Expected Surprisal Under Resampling
  • proof
  • Theorem 4: Cost of Repeated Resampling
  • proof
  • ...and 8 more