Can Language Models Be Tricked by Language Illusions? Easier with Syntax, Harder with Semantics

Yuhan Zhang; Edward Gibson; Forrest Davis

Can Language Models Be Tricked by Language Illusions? Easier with Syntax, Harder with Semantics

Yuhan Zhang, Edward Gibson, Forrest Davis

TL;DR

The paper interrogates whether four large language models (BERT, RoBERTa, GPT-2, GPT-3) mirror human sentence processing under three language-illusion types (comparative, depth-charge, NPI) using whole-sentence perplexity and word surprisal on carefully crafted minimal pairs analyzed via mixed-effects models. It finds no model consistently reproduces human illusion effects; results vary by illusion type, metric (perplexity vs surprisal), and licensor, with NPI-related judgments more susceptible to illusion effects in surprisal but not uniformly. This exposes limits in treating current LMs as cognitive models of human language processing and highlights the distinct roles of syntax versus semantics in their processing. The work underscores the need for more robust evaluation frameworks and models that better integrate world knowledge and pragmatic inference.

Abstract

Language models (LMs) have been argued to overlap substantially with human beings in grammaticality judgment tasks. But when humans systematically make errors in language processing, should we expect LMs to behave like cognitive models of language and mimic human behavior? We answer this question by investigating LMs' more subtle judgments associated with "language illusions" -- sentences that are vague in meaning, implausible, or ungrammatical but receive unexpectedly high acceptability judgments by humans. We looked at three illusions: the comparative illusion (e.g. "More people have been to Russia than I have"), the depth-charge illusion (e.g. "No head injury is too trivial to be ignored"), and the negative polarity item (NPI) illusion (e.g. "The hunter who no villager believed to be trustworthy will ever shoot a bear"). We found that probabilities represented by LMs were more likely to align with human judgments of being "tricked" by the NPI illusion which examines a structural dependency, compared to the comparative and the depth-charge illusions which require sophisticated semantic understanding. No single LM or metric yielded results that are entirely consistent with human behavior. Ultimately, we show that LMs are limited both in their construal as cognitive models of human language processing and in their capacity to recognize nuanced but critical information in complicated language materials.

Can Language Models Be Tricked by Language Illusions? Easier with Syntax, Harder with Semantics

TL;DR

Abstract

Paper Structure (26 sections, 2 equations, 6 figures, 5 tables)

This paper contains 26 sections, 2 equations, 6 figures, 5 tables.

Introduction
Related work
LMs' linguistic abilities
Language illusions
Methods
Models and Measures
Evaluation procedure
Comparative illusion
Acceptability differentiation
Illusion effect
Sensitivity to manipulations
Depth-charge illusion
Acceptability differentiation
Illusion effect
Sensitivity to manipulations
...and 11 more sections

Figures (6)

Figure 1: The $y$ axis shows the coefficient estimates which represent the increase in perplexity/surprisal when the sentence is unacceptable compared to the illusion case, crossing three language illusions and four LMs. "+" marks a human-like behavior, in this case, an illusion effect where the unacceptable condition receives significantly higher perplexity/surprisal values than the illusion condition. "*" means that the estimated coefficient is significant.
Figure 2: Estimated coefficients for critical linguistic manipulations in comparative illusion. The $y$ axis shows the estimated coefficients for the increase in perplexity/surprisal with respect to singular vs. plural than-clause subjects, or nonrepeatable vs. repeatable verb phrases, respectively. "*" means statistically significant contrasts; "+" means human-like results.
Figure 3: Estimated coefficients for the plausibility contrast (reference = plausible) in depth-charge illusions. The $y$ axis shows the increase in perplexity/surprisal when the sentence is implausible vs. plausible. "*" means statistically significant contrasts; "+" means human-like behavior. While we see differences among LMs and metrics in the "no...so...as to" and the "no...too...to" conditions, the condition of "no...too...to not" yielded completely opposite results to humans.
Figure 4: Estimated coefficients for the illusion effect (unacceptable vs. illusion = reference) in NPI illusions. The $y$ axis shows the increase in perplexity/surprisal when the sentence is ungrammatical vs. is in the illusion condition. "+" marks an illusion effect while none of the three licensors should trigger an illusion effect according to human behavior; "*" means a significant contrast.
Figure 5: Language models' performance on all three illusions. ✓ means LMs show human-like behavior.
...and 1 more figures

Can Language Models Be Tricked by Language Illusions? Easier with Syntax, Harder with Semantics

TL;DR

Abstract

Can Language Models Be Tricked by Language Illusions? Easier with Syntax, Harder with Semantics

Authors

TL;DR

Abstract

Table of Contents

Figures (6)