Table of Contents
Fetching ...

Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions

Michael J. Q. Zhang, W. Bradley Knox, Eunsol Choi

TL;DR

Ambiguity in user requests significantly challenges LLMs. The authors introduce double-turn preference labeling by simulating future turns to train LLMs to ask clarifying questions and tailor final answers to each interpretation. An automatic evaluation framework using simulated users on open-domain QA demonstrates consistent improvements in $F_1$ (about $5\%$) and in determining when clarification is needed (about $3\%$ accuracy). The results show that modeling future turns via double-turn preferences yields more effective and efficient clarifying interactions, with code and data released to support further research.

Abstract

Large language models (LLMs) must often respond to highly ambiguous user requests. In such cases, the LLM's best response may be to ask a clarifying question to elicit more information. Existing LLMs often respond by presupposing a single interpretation of such ambiguous requests, frustrating users who intended a different interpretation. We speculate this is caused by current preference data labeling practice, where LLM responses are evaluated only on their prior contexts. To address this, we assign preference labels by simulating their expected outcomes in future turns. This allows LLMs to learn to ask clarifying questions when it can generate responses that are tailored to each user interpretation in future turns. On open-domain QA datasets with multiple annotations, we evaluate systems based on their ability to ask clarifying questions to recover each user's interpretation and expected answer. We compare systems trained using our proposed preference labeling methods against standard methods, which assign preferences based on only prior context. Our method achieves a 5% improvement in F1 measured against the answer set from different interpretations of each query, showing the value of modeling future conversation turns. We further demonstrate that our method can be used to train models to judiciously determine when to ask clarifying questions, directly answering the question when clarification is unnecessary. In our experiments, we find that our method achieves a 3% improvement in accuracy of such judgments over existing methods.

Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions

TL;DR

Ambiguity in user requests significantly challenges LLMs. The authors introduce double-turn preference labeling by simulating future turns to train LLMs to ask clarifying questions and tailor final answers to each interpretation. An automatic evaluation framework using simulated users on open-domain QA demonstrates consistent improvements in (about ) and in determining when clarification is needed (about accuracy). The results show that modeling future turns via double-turn preferences yields more effective and efficient clarifying interactions, with code and data released to support further research.

Abstract

Large language models (LLMs) must often respond to highly ambiguous user requests. In such cases, the LLM's best response may be to ask a clarifying question to elicit more information. Existing LLMs often respond by presupposing a single interpretation of such ambiguous requests, frustrating users who intended a different interpretation. We speculate this is caused by current preference data labeling practice, where LLM responses are evaluated only on their prior contexts. To address this, we assign preference labels by simulating their expected outcomes in future turns. This allows LLMs to learn to ask clarifying questions when it can generate responses that are tailored to each user interpretation in future turns. On open-domain QA datasets with multiple annotations, we evaluate systems based on their ability to ask clarifying questions to recover each user's interpretation and expected answer. We compare systems trained using our proposed preference labeling methods against standard methods, which assign preferences based on only prior context. Our method achieves a 5% improvement in F1 measured against the answer set from different interpretations of each query, showing the value of modeling future conversation turns. We further demonstrate that our method can be used to train models to judiciously determine when to ask clarifying questions, directly answering the question when clarification is unnecessary. In our experiments, we find that our method achieves a 3% improvement in accuracy of such judgments over existing methods.

Paper Structure

This paper contains 28 sections, 1 equation, 3 figures, 14 tables.

Figures (3)

  • Figure 1: Our interaction scenario and preference labeling schemes. We aim to build an LLM that can interact with user to generate the final answer set $R$, containing an answer for each user, for the input query $x$. In this example, we include two responses from state-of-the-art LLMs ([A] from GPT-4 and [B] from Gemini, full responses in Appendix \ref{['app:dataset_details']}), which both presuppose an interpretation of the word football. We also include two clarifying responses ([C] and [D]) where [C] correctly disambiguates the two intended interpretations across all users. We depict two ways to assign preferences on LLM's initial output, single-turn and our proposed double-turn.
  • Figure 2: Depiction of our preference annotation method. Here, simulated users provide their responses to model-generated clarifying questions and determine preference based on which clarifying question or direct-answer responses lead to their expected answer. We then aggregate preferences across users by selecting the response that is preferred by the most users while minimizing the number of user interactions turns.
  • Figure 3: Illustration of models in our study (right) and the data used for training them (left). Ans-After-Clarify SFT model is used to generate responses for the fourth turn, User-simulator SFT model is used to generate responses for the third turn. All other models generate responses for the 2nd turn.