When the LM misunderstood the human chuckled: Analyzing garden path effects in humans and language models
Samuel Joseph Amouyal, Aya Meltzer-Asscher, Jonathan Berant
TL;DR
The paper investigates garden-path sentence processing in humans and a broad set of LLMs to determine whether AI models mirror human parsing difficulties. It combines a controlled human experiment with large-scale, few-shot LLM evaluations and extends validation through paraphrasing and text-to-image generation. Key findings show that humans and many LLMs struggle with specific syntactic/semantic manipulations, that model strength correlates with closer human similarity (via Kendall Tau), and that cross-task signals (paraphrasing, image generation) align with parsing behavior. The work highlights both the potential of LLMs as tools for probing human language processing and the limitations of current models in fully capturing human garden-path sensitivity.
Abstract
Modern Large Language Models (LLMs) have shown human-like abilities in many language tasks, sparking interest in comparing LLMs' and humans' language processing. In this paper, we conduct a detailed comparison of the two on a sentence comprehension task using garden-path constructions, which are notoriously challenging for humans. Based on psycholinguistic research, we formulate hypotheses on why garden-path sentences are hard, and test these hypotheses on human participants and a large suite of LLMs using comprehension questions. Our findings reveal that both LLMs and humans struggle with specific syntactic complexities, with some models showing high correlation with human comprehension. To complement our findings, we test LLM comprehension of garden-path constructions with paraphrasing and text-to-image generation tasks, and find that the results mirror the sentence comprehension question results, further validating our findings on LLM understanding of these constructions.
