Table of Contents
Fetching ...

Who Relies More on World Knowledge and Bias for Syntactic Ambiguity Resolution: Humans or LLMs?

So Young Lee, Russell Scheinberg, Amber Shore, Ameeta Agrawal

TL;DR

This work addresses how humans and LLMs resolve syntactic ambiguity in relative clauses across six languages by introducing MultiWho, a multilingual RC-attachment dataset developed through iterative linguist–LLM collaboration. The study finds that LLMs default to low-attachment and rely on world-knowledge biases, achieving high accuracy only in unambiguous cases, while humans exhibit language-specific attachment patterns and flexible interpretation when world knowledge conflicts with syntax. Methodologically, it combines a controlled English-led creation, language adaptations, and forced-choice paradigms with robust statistical analyses across multiple languages and answer-order conditions. The results highlight the need for more diverse, pragmatically nuanced multilingual training to produce LLMs with human-like, flexible language comprehension across contexts and cultures.

Abstract

This study explores how recent large language models (LLMs) navigate relative clause attachment {ambiguity} and use world knowledge biases for disambiguation in six typologically diverse languages: English, Chinese, Japanese, Korean, Russian, and Spanish. We describe the process of creating a novel dataset -- MultiWho -- for fine-grained evaluation of relative clause attachment preferences in ambiguous and unambiguous contexts. Our experiments with three LLMs indicate that, contrary to humans, LLMs consistently exhibit a preference for local attachment, displaying limited responsiveness to syntactic variations or language-specific attachment patterns. Although LLMs performed well in unambiguous cases, they rigidly prioritized world knowledge biases, lacking the flexibility of human language processing. These findings highlight the need for more diverse, pragmatically nuanced multilingual training to improve LLMs' handling of complex structures and human-like comprehension.

Who Relies More on World Knowledge and Bias for Syntactic Ambiguity Resolution: Humans or LLMs?

TL;DR

This work addresses how humans and LLMs resolve syntactic ambiguity in relative clauses across six languages by introducing MultiWho, a multilingual RC-attachment dataset developed through iterative linguist–LLM collaboration. The study finds that LLMs default to low-attachment and rely on world-knowledge biases, achieving high accuracy only in unambiguous cases, while humans exhibit language-specific attachment patterns and flexible interpretation when world knowledge conflicts with syntax. Methodologically, it combines a controlled English-led creation, language adaptations, and forced-choice paradigms with robust statistical analyses across multiple languages and answer-order conditions. The results highlight the need for more diverse, pragmatically nuanced multilingual training to produce LLMs with human-like, flexible language comprehension across contexts and cultures.

Abstract

This study explores how recent large language models (LLMs) navigate relative clause attachment {ambiguity} and use world knowledge biases for disambiguation in six typologically diverse languages: English, Chinese, Japanese, Korean, Russian, and Spanish. We describe the process of creating a novel dataset -- MultiWho -- for fine-grained evaluation of relative clause attachment preferences in ambiguous and unambiguous contexts. Our experiments with three LLMs indicate that, contrary to humans, LLMs consistently exhibit a preference for local attachment, displaying limited responsiveness to syntactic variations or language-specific attachment patterns. Although LLMs performed well in unambiguous cases, they rigidly prioritized world knowledge biases, lacking the flexibility of human language processing. These findings highlight the need for more diverse, pragmatically nuanced multilingual training to improve LLMs' handling of complex structures and human-like comprehension.

Paper Structure

This paper contains 26 sections, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Syntactic Structures of DP1 Modification (left) and DP2 Modification (right) in English
  • Figure 2: MultiWho Dataset: The dataset creation started with a list of requirements and three different conditions. Using a collaborative human-LLM process, we started with developing English sentences and continued through translation and localization, resulting in a multilingual dataset across six languages. While not all sentences are pragmatically equivalent in all languages, they are structurally equivalent with regard to our requirements. These datasets were evaluated in two ways: the English dataset was evaluated by 65 human annotators for ambiguity/DP-bias, and all 6 datasets were evaluated for ambiguity/DP-bias in three different answer order settings by LLMs.
  • Figure 3: High attachment (HA) response rates in ambiguous conditions (Attachment Preference)
  • Figure 4: Average matched responses with the given world knowledge and bias toward DP1 and DP2 (high and low attachment) in unambiguous conditions in English
  • Figure :