Multilingual Relative Clause Attachment Ambiguity Resolution in Large Language Models
So Young Lee, Russell Scheinberg, Amber Shore, Ameeta Agrawal
TL;DR
The paper investigates how large language models handle relative-clause attachment ambiguity across six languages, comparing with human sentence-processing patterns. It uses a replication of Hemforth 2015 with a forced-choice RC task and mixed-effects logistic regression to analyze RC identification and HA/LA biases, across English, Spanish, French, German, Japanese, and Korean, with dataset extension for Japanese and Korean. Results reveal substantial cross-language and cross-model variability, with longer RCs generally increasing High Attachment tendencies but notable divergences and translation artifacts in non-European languages ($p<0.05$ for several effects). Internal translation tendencies in Japanese and Korean lead to English-dominant outputs, contributing to RC identification errors and misalignment with human patterns. The study informs multilingual LLM design and prompting strategies to improve cross-linguistic syntactic ambiguity resolution in real-world settings.
Abstract
This study examines how large language models (LLMs) resolve relative clause (RC) attachment ambiguities and compares their performance to human sentence processing. Focusing on two linguistic factors, namely the length of RCs and the syntactic position of complex determiner phrases (DPs), we assess whether LLMs can achieve human-like interpretations amid the complexities of language. In this study, we evaluated several LLMs, including Claude, Gemini and Llama, in multiple languages: English, Spanish, French, German, Japanese, and Korean. While these models performed well in Indo-European languages (English, Spanish, French, and German), they encountered difficulties in Asian languages (Japanese and Korean), often defaulting to incorrect English translations. The findings underscore the variability in LLMs' handling of linguistic ambiguities and highlight the need for model improvements, particularly for non-European languages. This research informs future enhancements in LLM design to improve accuracy and human-like processing in diverse linguistic environments.
