Table of Contents
Fetching ...

Graded strength of comparative illusions is explained by Bayesian inference

Yuhan Zhang, Erxiao Wang, Cory Shain

TL;DR

The paper investigates how the comparative illusion (CI) in language can be explained by a noisy-channel Bayesian framework, predicting acceptability via the posterior $p(s_i|s_p)$. It couples language-model priors with a data-driven noise likelihood derived from human sentence-correction data, yielding an approximate posterior $\hat{p}(s_i|s_p)$ that accounts for graded illusion strength and subject-type effects. Through large-scale Experiment 1 and a corrective Experiment 2, it shows that averaging over multiple plausible interpretations (the $f_{mean}$ linking function) better explains acceptability than focusing on a single most-likely interpretation, supporting a probabilistic, multi-interpretation processing account. The findings advance a unified, computational-level view of language processing, with acceptability judgments reflecting real-time posterior probabilities and suggesting generalizable principles for diverse language illusions.

Abstract

Like visual processing, language processing is susceptible to illusions in which people systematically misperceive stimuli. In one such case--the comparative illusion (CI), e.g., More students have been to Russia than I have--comprehenders tend to judge the sentence as acceptable despite its underlying nonsensical comparison. Prior research has argued that this phenomenon can be explained as Bayesian inference over a noisy channel: the posterior probability of an interpretation of a sentence is proportional to both the prior probability of that interpretation and the likelihood of corruption into the observed (CI) sentence. Initial behavioral work has supported this claim by evaluating a narrow set of alternative interpretations of CI sentences and showing that comprehenders favor interpretations that are more likely to have been corrupted into the illusory sentence. In this study, we replicate and go substantially beyond this earlier work by directly predicting the strength of illusion with a quantitative model of the posterior probability of plausible interpretations, which we derive through a novel synthesis of statistical language models with human behavioral data. Our model explains not only the fine gradations in the strength of CI effects, but also a previously unexplained effect caused by pronominal vs. full noun phrase than-clause subjects. These findings support a noisy-channel theory of sentence comprehension by demonstrating that the theory makes novel predictions about the comparative illusion that bear out empirically. This outcome joins related evidence of noisy channel processing in both illusory and non-illusory contexts to support noisy channel inference as a unified computational-level theory of diverse language processing phenomena.

Graded strength of comparative illusions is explained by Bayesian inference

TL;DR

The paper investigates how the comparative illusion (CI) in language can be explained by a noisy-channel Bayesian framework, predicting acceptability via the posterior . It couples language-model priors with a data-driven noise likelihood derived from human sentence-correction data, yielding an approximate posterior that accounts for graded illusion strength and subject-type effects. Through large-scale Experiment 1 and a corrective Experiment 2, it shows that averaging over multiple plausible interpretations (the linking function) better explains acceptability than focusing on a single most-likely interpretation, supporting a probabilistic, multi-interpretation processing account. The findings advance a unified, computational-level view of language processing, with acceptability judgments reflecting real-time posterior probabilities and suggesting generalizable principles for diverse language illusions.

Abstract

Like visual processing, language processing is susceptible to illusions in which people systematically misperceive stimuli. In one such case--the comparative illusion (CI), e.g., More students have been to Russia than I have--comprehenders tend to judge the sentence as acceptable despite its underlying nonsensical comparison. Prior research has argued that this phenomenon can be explained as Bayesian inference over a noisy channel: the posterior probability of an interpretation of a sentence is proportional to both the prior probability of that interpretation and the likelihood of corruption into the observed (CI) sentence. Initial behavioral work has supported this claim by evaluating a narrow set of alternative interpretations of CI sentences and showing that comprehenders favor interpretations that are more likely to have been corrupted into the illusory sentence. In this study, we replicate and go substantially beyond this earlier work by directly predicting the strength of illusion with a quantitative model of the posterior probability of plausible interpretations, which we derive through a novel synthesis of statistical language models with human behavioral data. Our model explains not only the fine gradations in the strength of CI effects, but also a previously unexplained effect caused by pronominal vs. full noun phrase than-clause subjects. These findings support a noisy-channel theory of sentence comprehension by demonstrating that the theory makes novel predictions about the comparative illusion that bear out empirically. This outcome joins related evidence of noisy channel processing in both illusory and non-illusory contexts to support noisy channel inference as a unified computational-level theory of diverse language processing phenomena.

Paper Structure

This paper contains 29 sections, 8 equations, 7 figures, 11 tables.

Figures (7)

  • Figure 1: Acceptability ratings across the six conditions in Exp.1 (The violin plots represent the density of rating distributions with wider sections indicating higher concentration of data points. The white boxes represent the inter-quartile range and the short black horizontal line represents the median. The yellow dots represent the mean. * represents statistical significance and n.s. represents the opposite.)
  • Figure 2: The percentage distribution of different interpretation categories of the corrected sentences in Experiment 2
  • Figure 3: Mean DLD edit distance across the than-clause subject manipulations (A) and plausible interpretations across the four conditions (B). (Error bars represent the 95% bootstrapped confidence interval; the lack of data point for event negation corrections in the plural noun phrase condition is due to the lack of actual relevant corrections.)
  • Figure 4: GPT-2 Small: Pairwise correlation matrix for acceptability, SLOR, trial presentation order, acceptability of the baseline control sentence, and two posterior metrics. The diagonal shows density plots for each metric; the cells in the upper triangle shows the Pearson correlation coefficient and the significance value; the cells in the lower triangle show scatter plots of two compared variables and a linear regression line with shadings representing standard error. All variables except for the baseline control rating are on their original scale.
  • Figure A1: GPT-2 Small & $p(s)$ is LM derived: Pairwise correlation matrix for acceptability, SLOR, presentation order, acceptability of the control, and the posterior metrics (The diagonal line represents the density plot for each metric; the "Corr" cells represent the Pearson correlation coefficient and the significance value; the cells in the bottom-left triangle represent scatter plots of the two compared variables and a linear regression line with shadings representing the standard error.)
  • ...and 2 more figures