Aligning Language Models to Explicitly Handle Ambiguity

Hyuhng Joon Kim; Youna Kim; Cheonbok Park; Junyeob Kim; Choonghyun Park; Kang Min Yoo; Sang-goo Lee; Taeuk Kim

Aligning Language Models to Explicitly Handle Ambiguity

Hyuhng Joon Kim, Youna Kim, Cheonbok Park, Junyeob Kim, Choonghyun Park, Kang Min Yoo, Sang-goo Lee, Taeuk Kim

TL;DR

Experimental results on question-answering datasets demonstrate that APA empowers LLMs to explicitly detect and manage ambiguous queries while retaining the ability to answer clear questions, and proves that APA excels beyond training with gold-standard labels, especially in out-of-distribution scenarios.

Abstract

In interactions between users and language model agents, user utterances frequently exhibit ellipsis (omission of words or phrases) or imprecision (lack of exactness) to prioritize efficiency. This can lead to varying interpretations of the same input based on different assumptions or background knowledge. It is thus crucial for agents to adeptly handle the inherent ambiguity in queries to ensure reliability. However, even state-of-the-art large language models (LLMs) still face challenges in such scenarios, primarily due to the following hurdles: (1) LLMs are not explicitly trained to deal with ambiguous utterances; (2) the degree of ambiguity perceived by the LLMs may vary depending on the possessed knowledge. To address these issues, we propose Alignment with Perceived Ambiguity (APA), a novel pipeline that aligns LLMs to manage ambiguous queries by leveraging their own assessment of ambiguity (i.e., perceived ambiguity). Experimental results on question-answering datasets demonstrate that APA empowers LLMs to explicitly detect and manage ambiguous queries while retaining the ability to answer clear questions. Furthermore, our finding proves that APA excels beyond training with gold-standard labels, especially in out-of-distribution scenarios. The data and code are available at https://github.com/heyjoonkim/APA.

Aligning Language Models to Explicitly Handle Ambiguity

TL;DR

Abstract

Paper Structure (59 sections, 4 equations, 6 figures, 18 tables)

This paper contains 59 sections, 4 equations, 6 figures, 18 tables.

Introduction
Related Work
Ambiguity in NLP
Alignment of LLMs
Data Quality Control for Alignment
Methodology
Problem Formulation
Initial Prediction Assessment
Perceived Ambiguity Detection
Response Construction
Fixed Response
Generated Response
Supervised Fine-Tuning (SFT)
Experimental Setting
Datasets
...and 44 more sections

Figures (6)

Figure 1: An example of an ambiguous query from AmbigQA. The term "national championship" poses diverse denotations, causing ambiguity. (Left) A model with diverse relevant knowledge might perceive the case as ambiguous. (Right) In contrast, the query can be deemed unambiguous when the model lacks substantial related knowledge. Thus, the perceived ambiguity may differ depending on the model's intrinsic knowledge.
Figure 2: The overall process of Apa. We first select incorrect samples that the model currently fails to handle (Stage 1). The model then self-disambiguates these samples by leveraging its intrinsic knowledge. We measure the information gain ( Infogain) between the initial input and the disambiguation, identifying samples with high Infogain as ambiguous (Stage 2). Finally, the model generates a clarification request regarding the ambiguity (Stage 3), which is used as the label for training (Stage 4).
Figure 3: Illustration of five possible results from our scenario. For ambiguous queries, the prediction is correct (①) if the model generates a clarification request; otherwise, all the other responses are classified as incorrect (②). When evaluating unambiguous queries, we compare the predictions to the ground-truth labels and categorize them as the correct prediction (③), incorrect prediction (④), or incorrect clarification request (⑤).
Figure 4: Misaligned Clarification Request Rate (MCR) of trained methods. Low MCR indicates that the model retains its intrinsic knowledge even after the alignment process. In all instances, Apa exhibits the lowest MCR.
Figure 5: Changes in the F1$_{a}$ score according to the threshold value. Regardless of the threshold value, Apa consistently outperforms all the baselines.
...and 1 more figures

Aligning Language Models to Explicitly Handle Ambiguity

TL;DR

Abstract

Aligning Language Models to Explicitly Handle Ambiguity

Authors

TL;DR

Abstract

Table of Contents

Figures (6)