Table of Contents
Fetching ...

Learning to Ask Good Questions: Ranking Clarification Questions using Neural Expected Value of Perfect Information

Sudha Rao, Hal Daumé

TL;DR

The paper tackles how to rank clarification questions by their expected utility, introducing an EVPI-based neural model that jointly learns a question- and answer-generation representation along with a utility predictor. It leverages a Lucene-based candidate generator and a large StackExchange-derived dataset of (post, question, answer) triples to train the model end-to-end. Empirical results show the EVPI approach improves over baselines, especially when evaluated against expert judgments, and the authors release the dataset to support further research. The work suggests promising directions toward reinforcement learning and question-generation, with practical implications for real-time clarification in user-facing systems.

Abstract

Inquiry is fundamental to communication, and machines cannot effectively collaborate with humans unless they can ask questions. In this work, we build a neural network model for the task of ranking clarification questions. Our model is inspired by the idea of expected value of perfect information: a good question is one whose expected answer will be useful. We study this problem using data from StackExchange, a plentiful online resource in which people routinely ask clarifying questions to posts so that they can better offer assistance to the original poster. We create a dataset of clarification questions consisting of ~77K posts paired with a clarification question (and answer) from three domains of StackExchange: askubuntu, unix and superuser. We evaluate our model on 500 samples of this dataset against expert human judgments and demonstrate significant improvements over controlled baselines.

Learning to Ask Good Questions: Ranking Clarification Questions using Neural Expected Value of Perfect Information

TL;DR

The paper tackles how to rank clarification questions by their expected utility, introducing an EVPI-based neural model that jointly learns a question- and answer-generation representation along with a utility predictor. It leverages a Lucene-based candidate generator and a large StackExchange-derived dataset of (post, question, answer) triples to train the model end-to-end. Empirical results show the EVPI approach improves over baselines, especially when evaluated against expert judgments, and the authors release the dataset to support further research. The work suggests promising directions toward reinforcement learning and question-generation, with practical implications for real-time clarification in user-facing systems.

Abstract

Inquiry is fundamental to communication, and machines cannot effectively collaborate with humans unless they can ask questions. In this work, we build a neural network model for the task of ranking clarification questions. Our model is inspired by the idea of expected value of perfect information: a good question is one whose expected answer will be useful. We study this problem using data from StackExchange, a plentiful online resource in which people routinely ask clarifying questions to posts so that they can better offer assistance to the original poster. We create a dataset of clarification questions consisting of ~77K posts paired with a clarification question (and answer) from three domains of StackExchange: askubuntu, unix and superuser. We evaluate our model on 500 samples of this dataset against expert human judgments and demonstrate significant improvements over controlled baselines.

Paper Structure

This paper contains 25 sections, 6 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: A post on an online Q & A forum "askubuntu.com" is updated to fill the missing information pointed out by the question comment.
  • Figure 2: The behavior of our model during test time: Given a post $p$, we retrieve 10 posts similar to post $p$ using Lucene. The questions asked to those 10 posts are our question candidates $Q$ and the edits made to the posts in response to the questions are our answer candidates $A$. For each question candidate $q_i$, we generate an answer representation $F(p,q_i)$ and calculate how close is the answer candidate $a_j$ to our answer representation $F(p,q_i)$. We then calculate the utility of the post $p$ if it were updated with the answer $a_j$. Finally, we rank the candidate questions $Q$ by their expected utility given the post $p$ (\ref{['evpi_equation']}).
  • Figure 3: Training of our answer generator. Given a post $p_i$ and its question $q_i$, we generate an answer representation that is not only close to its original answer $a_i$, but also close to one of its candidate answers $a_j$ if the candidate question $q_j$ is close to the original question $q_i$.
  • Figure 4: Distribution of the count of questions in the intersection of the "valid" annotations.