Clarify When Necessary: Resolving Ambiguity Through Interaction with LMs

Michael J. Q. Zhang; Eunsol Choi

Clarify When Necessary: Resolving Ambiguity Through Interaction with LMs

Michael J. Q. Zhang, Eunsol Choi

TL;DR

This paper introduces a task-agnostic framework for resolving ambiguity in large language models through clarifying questions, decomposed into when to ask, what to ask, and how to respond. It presents Intent-Sim, a novel uncertainty estimator that simulates user intents to predict when clarification will improve end-task performance, and uses an oracle to generate clarifying interactions for evaluation. The approach is demonstrated across QA, NLI, and MT, with metrics tailored to each task and two data-generation schemes (sampled and uniform). Results show clarifying interactions can boost performance and that Intent-Sim offers robust, generalizable improvements across diverse models and tasks, laying groundwork for broader interactive ambiguity resolution in AI assistants.

Abstract

Resolving ambiguities through interaction is a hallmark of natural language, and modeling this behavior is a core challenge in crafting AI assistants. In this work, we study such behavior in LMs by proposing a task-agnostic framework for resolving ambiguity by asking users clarifying questions. Our framework breaks down this objective into three subtasks: (1) determining when clarification is needed, (2) determining what clarifying question to ask, and (3) responding accurately with the new information gathered through clarification. We evaluate systems across three NLP applications: question answering, machine translation and natural language inference. For the first subtask, we present a novel uncertainty estimation approach, intent-sim, that determines the utility of querying for clarification by estimating the entropy over user intents. Our method consistently outperforms existing uncertainty estimation approaches at identifying predictions that will benefit from clarification. When only allowed to ask for clarification on 10% of examples, our system is able to double the performance gains over randomly selecting examples to clarify. Furthermore, we find that intent-sim is robust, demonstrating improvements across a wide range of NLP tasks and LMs. Together, our work lays foundation for studying clarifying interactions with LMs.

Clarify When Necessary: Resolving Ambiguity Through Interaction with LMs

TL;DR

Abstract

Paper Structure (37 sections, 2 figures, 12 tables)

This paper contains 37 sections, 2 figures, 12 tables.

Introduction
A Framework for Resolving Ambiguity through Interaction
Definitions
Task 1: Determining when Clarification is Necessary
Evaluation Metric: Performance Under a Fixed Interaction Budget
Evaluation Metric: AUROC
Task 2: Generating Clarifying Questions and Answers
Task 3: Responding to Clarifications
Datasets and Applications
Question Answering
QA Performance Metric
Natural Language Inference
NLI Performance Metric
Machine Translation
MT Performance Metric
...and 22 more sections

Figures (2)

Figure 1: Our three-stage framework for resolving ambiguity with clarification questions. In the first step, systems must identify which inputs will benefit from clarification. In the second step, after deciding to clarify, we provide systems with a clarifying QA pair corresponding to the gold interpretation, which we generate from existing sources of disambiguated input/output pairs. Finally, in the third step, systems use the input and the clarifying QA pair to arrive at the correct output.
Figure 2: Intent-Sim algorithm. We sample a clarifying question and responses from the LLM. We then construct a equivalence graph of responses, $G$, using NLI to determine equivalence. Finally, we identify disjoint subgraphs of $G$ with depth-first-search, representing distinct intents, and estimate the entropy over intents.

Clarify When Necessary: Resolving Ambiguity Through Interaction with LMs

TL;DR

Abstract

Clarify When Necessary: Resolving Ambiguity Through Interaction with LMs

Authors

TL;DR

Abstract

Table of Contents

Figures (2)