Clarify When Necessary: Resolving Ambiguity Through Interaction with LMs
Michael J. Q. Zhang, Eunsol Choi
TL;DR
This paper introduces a task-agnostic framework for resolving ambiguity in large language models through clarifying questions, decomposed into when to ask, what to ask, and how to respond. It presents Intent-Sim, a novel uncertainty estimator that simulates user intents to predict when clarification will improve end-task performance, and uses an oracle to generate clarifying interactions for evaluation. The approach is demonstrated across QA, NLI, and MT, with metrics tailored to each task and two data-generation schemes (sampled and uniform). Results show clarifying interactions can boost performance and that Intent-Sim offers robust, generalizable improvements across diverse models and tasks, laying groundwork for broader interactive ambiguity resolution in AI assistants.
Abstract
Resolving ambiguities through interaction is a hallmark of natural language, and modeling this behavior is a core challenge in crafting AI assistants. In this work, we study such behavior in LMs by proposing a task-agnostic framework for resolving ambiguity by asking users clarifying questions. Our framework breaks down this objective into three subtasks: (1) determining when clarification is needed, (2) determining what clarifying question to ask, and (3) responding accurately with the new information gathered through clarification. We evaluate systems across three NLP applications: question answering, machine translation and natural language inference. For the first subtask, we present a novel uncertainty estimation approach, intent-sim, that determines the utility of querying for clarification by estimating the entropy over user intents. Our method consistently outperforms existing uncertainty estimation approaches at identifying predictions that will benefit from clarification. When only allowed to ask for clarification on 10% of examples, our system is able to double the performance gains over randomly selecting examples to clarify. Furthermore, we find that intent-sim is robust, demonstrating improvements across a wide range of NLP tasks and LMs. Together, our work lays foundation for studying clarifying interactions with LMs.
