QUDSELECT: Selective Decoding for Questions Under Discussion Parsing

Ashima Suvarna; Xiao Liu; Tanmay Parekh; Kai-Wei Chang; Nanyun Peng

QUDSELECT: Selective Decoding for Questions Under Discussion Parsing

Ashima Suvarna, Xiao Liu, Tanmay Parekh, Kai-Wei Chang, Nanyun Peng

TL;DR

QUDSelect reframes QUD parsing as a joint task of anchor prediction and question generation, leveraging instruction-tuning and selective decoding to satisfy answer compatibility, givenness, and anchor relevance. It generates multiple candidate (anchor, question) pairs for each answer sentence, scoring them with three criteria scorers and selecting the best overall. On the DCQA dataset, QUDSelect achieves about a 9% improvement in human evaluations and around a 4% improvement in automatic evaluations over baselines, demonstrating the value of a holistic, criteria-driven approach. The framework includes automated evaluators to reduce annotation costs and shows potential for handling QUD parsing more robustly, with limitations around hierarchical QUD structures and candidate sampling efficiency.

Abstract

Question Under Discussion (QUD) is a discourse framework that uses implicit questions to reveal discourse relationships between sentences. In QUD parsing, each sentence is viewed as an answer to a question triggered by an anchor sentence in prior context. The resulting QUD structure is required to conform to several theoretical criteria like answer compatibility (how well the question is answered), making QUD parsing a challenging task. Previous works construct QUD parsers in a pipelined manner (i.e. detect the trigger sentence in context and then generate the question). However, these parsers lack a holistic view of the task and can hardly satisfy all the criteria. In this work, we introduce QUDSELECT, a joint-training framework that selectively decodes the QUD dependency structures considering the QUD criteria. Using instruction-tuning, we train models to simultaneously predict the anchor sentence and generate the associated question. To explicitly incorporate the criteria, we adopt a selective decoding strategy of sampling multiple QUD candidates during inference, followed by selecting the best one with criteria scorers. Our method outperforms the state-of-the-art baseline models by 9% in human evaluation and 4% in automatic evaluation, demonstrating the effectiveness of our framework.

QUDSELECT: Selective Decoding for Questions Under Discussion Parsing

TL;DR

Abstract

Paper Structure (25 sections, 7 figures, 5 tables)

This paper contains 25 sections, 7 figures, 5 tables.

Introduction
Related Work
The QUDSelect Framework
Task Formulation
Overview
QUD Parser Training
Selective Decoding
Criteria Scorers.
Experimental Setup
Models and Datasets
Baselines
Human Evaluation
Automatic Evaluation
Results and Analysis
Main Results
...and 10 more sections

Figures (7)

Figure 1: An article snippet along with the associated QUD dependency structure. Each edge from $s_i$ to $s_j$ with attribute $q$ indicates sentence $s_j$ anchors the question $q$, and sentence $s_i$ answers the question $q$.
Figure 2: Overview of our QUDSelect framework.
Figure 3: Hyperparameter analysis on the number of candidates. QUDSelect shows improved performance with an increased number of candidates.
Figure 4: Prompt format for instruction tuning QUD parsers.
Figure 5: Article snippet used in case study.
...and 2 more figures

QUDSELECT: Selective Decoding for Questions Under Discussion Parsing

TL;DR

Abstract

QUDSELECT: Selective Decoding for Questions Under Discussion Parsing

Authors

TL;DR

Abstract

Table of Contents

Figures (7)