CLAM: Selective Clarification for Ambiguous Questions with Generative Language Models

Lorenz Kuhn; Yarin Gal; Sebastian Farquhar

CLAM: Selective Clarification for Ambiguous Questions with Generative Language Models

Lorenz Kuhn, Yarin Gal, Sebastian Farquhar

TL;DR

CLAM tackles the problem of ambiguous user questions in large language models by enabling selective clarification: detect ambiguity, generate clarifying questions, and answer with the clarification. The authors introduce Ambiguous TriviaQA and an automatic evaluation protocol using an oracle model to simulate user clarifications, showing substantial gains in end-to-end QA accuracy on ambiguous inputs while preserving performance on unambiguous ones. The work frames meta-cognition as a practical strategy for safer model deployment and provides a data-generating evaluation methodology to scale research in multi-turn dialogues.

Abstract

Users often ask dialogue systems ambiguous questions that require clarification. We show that current language models rarely ask users to clarify ambiguous questions and instead provide incorrect answers. To address this, we introduce CLAM: a framework for getting language models to selectively ask for clarification about ambiguous user questions. In particular, we show that we can prompt language models to detect whether a given question is ambiguous, generate an appropriate clarifying question to ask the user, and give a final answer after receiving clarification. We also show that we can simulate users by providing language models with privileged information. This lets us automatically evaluate multi-turn clarification dialogues. Finally, CLAM significantly improves language models' accuracy on mixed ambiguous and unambiguous questions relative to SotA.

CLAM: Selective Clarification for Ambiguous Questions with Generative Language Models

TL;DR

Abstract

Paper Structure (21 sections, 10 figures, 4 tables)

This paper contains 21 sections, 10 figures, 4 tables.

Introduction
Selective Clarification QA Data Set
CLAM: Selective clarification Framework
Meta-cognition
Automatic Evaluation Protocol
Experiments
Results
Overall performance
Individual pipeline components
Related work
Conclusion
Additional experimental results
Results on davinci model
Ablation of penalty term in adjusted accuracy
Additional analysis of CLAM performance
...and 6 more sections

Figures (10)

Figure 1: (a) Normally, LMs answer one of many interpretations given an ambiguous question. (b) Our method uses few-shot classification to detect ambiguous questions and selectively asks for clarifying information needed to answer the question.
Figure 2: Overview of Selective Clarification
Figure 3: Overview of the prompts used to clarify ambiguous user inputs. In step 1a, the user asks an ambiguous question. In step 1b (omitted for clarity) the question is classified as ambiguous using a few-shot prompt. In step 2, the model is few-shot prompted to generate a clarifying question about the ambiguous user input. In step 3, the user (or an oracle model, see Section \ref{['section_automatic_evaluation_protocol']}) provides clarifying information given the clarifying question. In step 4, the model is prompted to answer the initial question given the clarification from the user.
Figure 4: Using a language model to provide clarification. We prompt a language model to provide clarifying information given a clarifying question about an ambiguous user input. Our parallel corpus of ambiguous and corresponding unambiguous questions allows us to provide the unambiguous question to the oracle LM, based on which it can then provide appropriate clarifying information about the ambiguous question. See Figure \ref{['figure_prompting_diagram']} for a description of the other conversational turns.
Figure 5: CLAM improves question-answering accuracy on a set of ambiguous and unambiguous Trivia questions. (a) CLAM clarifies ambiguous questions without asking for unnecessary clarification on unambiguous questions (which is reflected in the adjusted accuracy metric). Always prompting the language model to ask the user for clarification increases the accuracy on ambiguous questions but incurs a penalty on unambiguous questions. The prompting baseline rarely asks the user for clarification and thus only improves the accuracy slightly. (b) Accuracy on full data set: Without penalizing unnecessary clarifying questions, always prompting for clarification and CLAM perform comparably well, and much better than default GPT and the prompting baseline. CLAQUA and ClariQ only cover parts of the selective clarification pipeline which is why only TriviaQA results are reported here.
...and 5 more figures

CLAM: Selective Clarification for Ambiguous Questions with Generative Language Models

TL;DR

Abstract

CLAM: Selective Clarification for Ambiguous Questions with Generative Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (10)