Table of Contents
Fetching ...

CLARINET: Augmenting Language Models to Ask Clarification Questions for Retrieval

Yizhou Chi, Jessy Lin, Kevin Lin, Dan Klein

TL;DR

CLARINET addresses ambiguity in retrieval by training an LLM to generate clarification questions conditioned on the retriever distribution, producing a language posterior to re-rank candidates. It distills explicit question-search exploration into end-to-end learning via FiD, enabling cheaper inference while outperforming information-theoretic baselines (EIG, KL) and vanilla prompting by substantial margins. On a real TOT book dataset, CLARINET achieves a top-1 retrieval and MRR improvement, with delta-training yielding the strongest gains (MRR ≈ 0.659; cumulative retrieval ≈ 0.764). The approach highlights the value of summarizing dialogue into a language posterior and using a fusion-based encoder-decoder to condition question generation on per-candidate context.

Abstract

Users often make ambiguous requests that require clarification. We study the problem of asking clarification questions in an information retrieval setting, where systems often face ambiguous search queries and it is challenging to turn the uncertainty in the retrieval model into a natural language question. We present CLARINET, a system that asks informative clarification questions by choosing questions whose answers would maximize certainty in the correct candidate. Our approach works by augmenting a large language model (LLM) to condition on a retrieval distribution, finetuning end-to-end to generate the question that would have maximized the rank of the true candidate at each turn. When evaluated on a real-world retrieval dataset of users searching for books, our system outperforms traditional heuristics such as information gain on retrieval success by 17% and vanilla-prompted LLMs by 39% relative.

CLARINET: Augmenting Language Models to Ask Clarification Questions for Retrieval

TL;DR

CLARINET addresses ambiguity in retrieval by training an LLM to generate clarification questions conditioned on the retriever distribution, producing a language posterior to re-rank candidates. It distills explicit question-search exploration into end-to-end learning via FiD, enabling cheaper inference while outperforming information-theoretic baselines (EIG, KL) and vanilla prompting by substantial margins. On a real TOT book dataset, CLARINET achieves a top-1 retrieval and MRR improvement, with delta-training yielding the strongest gains (MRR ≈ 0.659; cumulative retrieval ≈ 0.764). The approach highlights the value of summarizing dialogue into a language posterior and using a fusion-based encoder-decoder to condition question generation on per-candidate context.

Abstract

Users often make ambiguous requests that require clarification. We study the problem of asking clarification questions in an information retrieval setting, where systems often face ambiguous search queries and it is challenging to turn the uncertainty in the retrieval model into a natural language question. We present CLARINET, a system that asks informative clarification questions by choosing questions whose answers would maximize certainty in the correct candidate. Our approach works by augmenting a large language model (LLM) to condition on a retrieval distribution, finetuning end-to-end to generate the question that would have maximized the rank of the true candidate at each turn. When evaluated on a real-world retrieval dataset of users searching for books, our system outperforms traditional heuristics such as information gain on retrieval success by 17% and vanilla-prompted LLMs by 39% relative.
Paper Structure (23 sections, 6 equations, 5 figures, 1 table)

This paper contains 23 sections, 6 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: The system takes in the user's initial query and past interactions and summarizes the content before feeding them into the retriever. The retriever computes the corresponding score for each candidate and outputs a confidence distribution. The system then encodes each passage concatenated with the context separately, and these embeddings will be supplied to the decoder to generate the next clarification question.
  • Figure 2: The retrieval performance for random question selection, EIG, KL, and our model with SEM error bars
  • Figure 3: The retrieval performance for model variants.
  • Figure 4: Language Posterior vs Explicit Posterior
  • Figure 5: For every 100 questions (roughly 10 retrieval games) that each objective function selects, the number of describe-type questions, binary questions, character-related questions, event-related questions, and others; the questions are labeled by GPT3.5-turbo-0613