Who Relies More on World Knowledge and Bias for Syntactic Ambiguity Resolution: Humans or LLMs?
So Young Lee, Russell Scheinberg, Amber Shore, Ameeta Agrawal
TL;DR
This work addresses how humans and LLMs resolve syntactic ambiguity in relative clauses across six languages by introducing MultiWho, a multilingual RC-attachment dataset developed through iterative linguist–LLM collaboration. The study finds that LLMs default to low-attachment and rely on world-knowledge biases, achieving high accuracy only in unambiguous cases, while humans exhibit language-specific attachment patterns and flexible interpretation when world knowledge conflicts with syntax. Methodologically, it combines a controlled English-led creation, language adaptations, and forced-choice paradigms with robust statistical analyses across multiple languages and answer-order conditions. The results highlight the need for more diverse, pragmatically nuanced multilingual training to produce LLMs with human-like, flexible language comprehension across contexts and cultures.
Abstract
This study explores how recent large language models (LLMs) navigate relative clause attachment {ambiguity} and use world knowledge biases for disambiguation in six typologically diverse languages: English, Chinese, Japanese, Korean, Russian, and Spanish. We describe the process of creating a novel dataset -- MultiWho -- for fine-grained evaluation of relative clause attachment preferences in ambiguous and unambiguous contexts. Our experiments with three LLMs indicate that, contrary to humans, LLMs consistently exhibit a preference for local attachment, displaying limited responsiveness to syntactic variations or language-specific attachment patterns. Although LLMs performed well in unambiguous cases, they rigidly prioritized world knowledge biases, lacking the flexibility of human language processing. These findings highlight the need for more diverse, pragmatically nuanced multilingual training to improve LLMs' handling of complex structures and human-like comprehension.
