Table of Contents
Fetching ...

Multilingual Relative Clause Attachment Ambiguity Resolution in Large Language Models

So Young Lee, Russell Scheinberg, Amber Shore, Ameeta Agrawal

TL;DR

The paper investigates how large language models handle relative-clause attachment ambiguity across six languages, comparing with human sentence-processing patterns. It uses a replication of Hemforth 2015 with a forced-choice RC task and mixed-effects logistic regression to analyze RC identification and HA/LA biases, across English, Spanish, French, German, Japanese, and Korean, with dataset extension for Japanese and Korean. Results reveal substantial cross-language and cross-model variability, with longer RCs generally increasing High Attachment tendencies but notable divergences and translation artifacts in non-European languages ($p<0.05$ for several effects). Internal translation tendencies in Japanese and Korean lead to English-dominant outputs, contributing to RC identification errors and misalignment with human patterns. The study informs multilingual LLM design and prompting strategies to improve cross-linguistic syntactic ambiguity resolution in real-world settings.

Abstract

This study examines how large language models (LLMs) resolve relative clause (RC) attachment ambiguities and compares their performance to human sentence processing. Focusing on two linguistic factors, namely the length of RCs and the syntactic position of complex determiner phrases (DPs), we assess whether LLMs can achieve human-like interpretations amid the complexities of language. In this study, we evaluated several LLMs, including Claude, Gemini and Llama, in multiple languages: English, Spanish, French, German, Japanese, and Korean. While these models performed well in Indo-European languages (English, Spanish, French, and German), they encountered difficulties in Asian languages (Japanese and Korean), often defaulting to incorrect English translations. The findings underscore the variability in LLMs' handling of linguistic ambiguities and highlight the need for model improvements, particularly for non-European languages. This research informs future enhancements in LLM design to improve accuracy and human-like processing in diverse linguistic environments.

Multilingual Relative Clause Attachment Ambiguity Resolution in Large Language Models

TL;DR

The paper investigates how large language models handle relative-clause attachment ambiguity across six languages, comparing with human sentence-processing patterns. It uses a replication of Hemforth 2015 with a forced-choice RC task and mixed-effects logistic regression to analyze RC identification and HA/LA biases, across English, Spanish, French, German, Japanese, and Korean, with dataset extension for Japanese and Korean. Results reveal substantial cross-language and cross-model variability, with longer RCs generally increasing High Attachment tendencies but notable divergences and translation artifacts in non-European languages ( for several effects). Internal translation tendencies in Japanese and Korean lead to English-dominant outputs, contributing to RC identification errors and misalignment with human patterns. The study informs multilingual LLM design and prompting strategies to improve cross-linguistic syntactic ambiguity resolution in real-world settings.

Abstract

This study examines how large language models (LLMs) resolve relative clause (RC) attachment ambiguities and compares their performance to human sentence processing. Focusing on two linguistic factors, namely the length of RCs and the syntactic position of complex determiner phrases (DPs), we assess whether LLMs can achieve human-like interpretations amid the complexities of language. In this study, we evaluated several LLMs, including Claude, Gemini and Llama, in multiple languages: English, Spanish, French, German, Japanese, and Korean. While these models performed well in Indo-European languages (English, Spanish, French, and German), they encountered difficulties in Asian languages (Japanese and Korean), often defaulting to incorrect English translations. The findings underscore the variability in LLMs' handling of linguistic ambiguities and highlight the need for model improvements, particularly for non-European languages. This research informs future enhancements in LLM design to improve accuracy and human-like processing in diverse linguistic environments.

Paper Structure

This paper contains 18 sections, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Syntactic structures for the two interpretations
  • Figure 2: Human sentence processing results hemforth2015relative
  • Figure 3: Overview of methodology
  • Figure 4: Models' performance on RC identification by languages: raw counts of the successful RC identification
  • Figure 5: Distribution of attachment answers by model and language