Table of Contents
Fetching ...

Do LLMs Understand Ambiguity in Text? A Case Study in Open-world Question Answering

Aryan Keluskar, Amrita Bhattacharjee, Huan Liu

TL;DR

It is demonstrated how simple, training-free, token-level disambiguation methods may be effectively used to improve LLM performance for ambiguous question answering tasks.

Abstract

Ambiguity in natural language poses significant challenges to Large Language Models (LLMs) used for open-domain question answering. LLMs often struggle with the inherent uncertainties of human communication, leading to misinterpretations, miscommunications, hallucinations, and biased responses. This significantly weakens their ability to be used for tasks like fact-checking, question answering, feature extraction, and sentiment analysis. Using open-domain question answering as a test case, we compare off-the-shelf and few-shot LLM performance, focusing on measuring the impact of explicit disambiguation strategies. We demonstrate how simple, training-free, token-level disambiguation methods may be effectively used to improve LLM performance for ambiguous question answering tasks. We empirically show our findings and discuss best practices and broader impacts regarding ambiguity in LLMs.

Do LLMs Understand Ambiguity in Text? A Case Study in Open-world Question Answering

TL;DR

It is demonstrated how simple, training-free, token-level disambiguation methods may be effectively used to improve LLM performance for ambiguous question answering tasks.

Abstract

Ambiguity in natural language poses significant challenges to Large Language Models (LLMs) used for open-domain question answering. LLMs often struggle with the inherent uncertainties of human communication, leading to misinterpretations, miscommunications, hallucinations, and biased responses. This significantly weakens their ability to be used for tasks like fact-checking, question answering, feature extraction, and sentiment analysis. Using open-domain question answering as a test case, we compare off-the-shelf and few-shot LLM performance, focusing on measuring the impact of explicit disambiguation strategies. We demonstrate how simple, training-free, token-level disambiguation methods may be effectively used to improve LLM performance for ambiguous question answering tasks. We empirically show our findings and discuss best practices and broader impacts regarding ambiguity in LLMs.

Paper Structure

This paper contains 10 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The problem of ambiguity in open domain question answering (QA) (left), and how we try to solve it for large language model QA (right).
  • Figure 2: Kernel Density Estimate (KDE) Plot to compare the Cosine Similarity between Ground Truth Answer and LLM Response for the 2 disambiguation strategies for the randomly sampled subset of 1,000 Ambiguous Questions.
  • Figure 3: Kernel Density Estimate (KDE) Plot to compare the Cosine Similarity between Ground Truth Answer and LLM Response for the 2 disambiguation strategies for the subset of AmbigQA where human-provided answer for human-provided disambiguated question matched the ground truth.
  • Figure 4: Comparison of GT Answer Overlap for GPT 4o and 4o-mini for both high and low temperatures. High = 1.0, low = 0.2. Higher overlap scores are better.