Table of Contents
Fetching ...

Can LLMs Detect Ambiguous Plural Reference? An Analysis of Split-Antecedent and Mereological Reference

Dang Anh, Rick Nouwen, Massimo Poesio

TL;DR

The paper investigates how LLMs handle ambiguous plural reference, focusing on split-antecedent and mereological cases. It combines production, interpretation, and ambiguity-detection tasks across multiple autoregressive models and prompting strategies to compare with human psycholinguistic patterns. Findings show partial alignment with human preferences in some contexts, but clear inconsistencies and instruction-dependence in others, highlighting both knowledge of ambiguity and limits of applying it without explicit cues. The work advances understanding of plurality representation in LLMs and underscores the role of prompting and context in eliciting robust ambiguity handling.

Abstract

Our goal is to study how LLMs represent and interpret plural reference in ambiguous and unambiguous contexts. We ask the following research questions: (1) Do LLMs exhibit human-like preferences in representing plural reference? (2) Are LLMs able to detect ambiguity in plural anaphoric expressions and identify possible referents? To address these questions, we design a set of experiments, examining pronoun production using next-token prediction tasks, pronoun interpretation, and ambiguity detection using different prompting strategies. We then assess how comparable LLMs are to humans in formulating and interpreting plural reference. We find that LLMs are sometimes aware of possible referents of ambiguous pronouns. However, they do not always follow human reference when choosing between interpretations, especially when the possible interpretation is not explicitly mentioned. In addition, they struggle to identify ambiguity without direct instruction. Our findings also reveal inconsistencies in the results across different types of experiments.

Can LLMs Detect Ambiguous Plural Reference? An Analysis of Split-Antecedent and Mereological Reference

TL;DR

The paper investigates how LLMs handle ambiguous plural reference, focusing on split-antecedent and mereological cases. It combines production, interpretation, and ambiguity-detection tasks across multiple autoregressive models and prompting strategies to compare with human psycholinguistic patterns. Findings show partial alignment with human preferences in some contexts, but clear inconsistencies and instruction-dependence in others, highlighting both knowledge of ambiguity and limits of applying it without explicit cues. The work advances understanding of plurality representation in LLMs and underscores the role of prompting and context in eliciting robust ambiguity handling.

Abstract

Our goal is to study how LLMs represent and interpret plural reference in ambiguous and unambiguous contexts. We ask the following research questions: (1) Do LLMs exhibit human-like preferences in representing plural reference? (2) Are LLMs able to detect ambiguity in plural anaphoric expressions and identify possible referents? To address these questions, we design a set of experiments, examining pronoun production using next-token prediction tasks, pronoun interpretation, and ambiguity detection using different prompting strategies. We then assess how comparable LLMs are to humans in formulating and interpreting plural reference. We find that LLMs are sometimes aware of possible referents of ambiguous pronouns. However, they do not always follow human reference when choosing between interpretations, especially when the possible interpretation is not explicitly mentioned. In addition, they struggle to identify ambiguity without direct instruction. Our findings also reveal inconsistencies in the results across different types of experiments.

Paper Structure

This paper contains 43 sections, 2 figures, 5 tables.

Figures (2)

  • Figure 1: An illustration of the setup and results of our prompting experiments about how the extent to which LLMs can detect ambiguity in anaphoric expressions. Psycholinguistic studies show that in the ambiguous sentence above, humans prefer using it to refer to combination of the engine and the boxcar or to one of the objects. We found that LLMs' responses vary depending on how much information about ambiguity is present in the prompt.
  • Figure 2: Results of prompting experiments with four different prompts (P1, P2, P3, and P4). The x-axis displays different types of LLMs' responses. In P1, Ind. is the one of the constituent. Ind. + Mereo. means that they listed both the constituents and the mereological object. In P2, the LLMs choose between the constituents (Obj.1 and Obj. 2, the mereological object (Mereo. Obj. or any of the mentioned objects (Any). In P3, they answered Yes or No to whether the mereological object can be the referent. In P4, the LLMs all identified it as ambiguous. The Ambiguous + Mereo. column shows when the LLMs mention the mereological object as a possible referent.