Table of Contents
Fetching ...

QUDSELECT: Selective Decoding for Questions Under Discussion Parsing

Ashima Suvarna, Xiao Liu, Tanmay Parekh, Kai-Wei Chang, Nanyun Peng

TL;DR

QUDSelect reframes QUD parsing as a joint task of anchor prediction and question generation, leveraging instruction-tuning and selective decoding to satisfy answer compatibility, givenness, and anchor relevance. It generates multiple candidate (anchor, question) pairs for each answer sentence, scoring them with three criteria scorers and selecting the best overall. On the DCQA dataset, QUDSelect achieves about a 9% improvement in human evaluations and around a 4% improvement in automatic evaluations over baselines, demonstrating the value of a holistic, criteria-driven approach. The framework includes automated evaluators to reduce annotation costs and shows potential for handling QUD parsing more robustly, with limitations around hierarchical QUD structures and candidate sampling efficiency.

Abstract

Question Under Discussion (QUD) is a discourse framework that uses implicit questions to reveal discourse relationships between sentences. In QUD parsing, each sentence is viewed as an answer to a question triggered by an anchor sentence in prior context. The resulting QUD structure is required to conform to several theoretical criteria like answer compatibility (how well the question is answered), making QUD parsing a challenging task. Previous works construct QUD parsers in a pipelined manner (i.e. detect the trigger sentence in context and then generate the question). However, these parsers lack a holistic view of the task and can hardly satisfy all the criteria. In this work, we introduce QUDSELECT, a joint-training framework that selectively decodes the QUD dependency structures considering the QUD criteria. Using instruction-tuning, we train models to simultaneously predict the anchor sentence and generate the associated question. To explicitly incorporate the criteria, we adopt a selective decoding strategy of sampling multiple QUD candidates during inference, followed by selecting the best one with criteria scorers. Our method outperforms the state-of-the-art baseline models by 9% in human evaluation and 4% in automatic evaluation, demonstrating the effectiveness of our framework.

QUDSELECT: Selective Decoding for Questions Under Discussion Parsing

TL;DR

QUDSelect reframes QUD parsing as a joint task of anchor prediction and question generation, leveraging instruction-tuning and selective decoding to satisfy answer compatibility, givenness, and anchor relevance. It generates multiple candidate (anchor, question) pairs for each answer sentence, scoring them with three criteria scorers and selecting the best overall. On the DCQA dataset, QUDSelect achieves about a 9% improvement in human evaluations and around a 4% improvement in automatic evaluations over baselines, demonstrating the value of a holistic, criteria-driven approach. The framework includes automated evaluators to reduce annotation costs and shows potential for handling QUD parsing more robustly, with limitations around hierarchical QUD structures and candidate sampling efficiency.

Abstract

Question Under Discussion (QUD) is a discourse framework that uses implicit questions to reveal discourse relationships between sentences. In QUD parsing, each sentence is viewed as an answer to a question triggered by an anchor sentence in prior context. The resulting QUD structure is required to conform to several theoretical criteria like answer compatibility (how well the question is answered), making QUD parsing a challenging task. Previous works construct QUD parsers in a pipelined manner (i.e. detect the trigger sentence in context and then generate the question). However, these parsers lack a holistic view of the task and can hardly satisfy all the criteria. In this work, we introduce QUDSELECT, a joint-training framework that selectively decodes the QUD dependency structures considering the QUD criteria. Using instruction-tuning, we train models to simultaneously predict the anchor sentence and generate the associated question. To explicitly incorporate the criteria, we adopt a selective decoding strategy of sampling multiple QUD candidates during inference, followed by selecting the best one with criteria scorers. Our method outperforms the state-of-the-art baseline models by 9% in human evaluation and 4% in automatic evaluation, demonstrating the effectiveness of our framework.
Paper Structure (25 sections, 7 figures, 5 tables)

This paper contains 25 sections, 7 figures, 5 tables.

Figures (7)

  • Figure 1: An article snippet along with the associated QUD dependency structure. Each edge from $s_i$ to $s_j$ with attribute $q$ indicates sentence $s_j$ anchors the question $q$, and sentence $s_i$ answers the question $q$.
  • Figure 2: Overview of our QUDSelect framework.
  • Figure 3: Hyperparameter analysis on the number of candidates. QUDSelect shows improved performance with an increased number of candidates.
  • Figure 4: Prompt format for instruction tuning QUD parsers.
  • Figure 5: Article snippet used in case study.
  • ...and 2 more figures