Controllable Decontextualization of Yes/No Question and Answers into Factual Statements
Lingbo Mo, Besnik Fetahu, Oleg Rokhlenko, Shervin Malmasi
TL;DR
The paper tackles decontextualizing Yes/No polar QA by defining the PAR task, which rewrites polar question answers into standalone factual statements. It introduces SMF, a Transformer-based seq2seq model that uses automatically extracted constraints from constituency parses and encodes them as a soft-mention flag matrix $ extbf{M}$ to enforce constraint satisfaction semantically during decoding, with shared and style-aware variants (SMF and SMF-Style). The approach operates on inputs $ extbf{x}=[ extbf{q}; ext{SEP}; extbf{a}; ext{SEP}; extbf{c}]$ to produce $ extbf{y}$, leveraging constraint embeddings $ extbf{M}^k$ and $ extbf{M}^v$ in cross-attention, and achieves strong performance on a 1500-instance PAR dataset drawn from Amazon PQA, outperforming baselines including T5, CBS, MF, GPT-2, and COLD on automated metrics and human judgments. The work demonstrates robust out-of-domain generalization and shows that semantic-level constraint satisfaction is more effective for this rewriting task than token-level constraint enforcement, enabling broader reuse of PQA-derived knowledge across diverse question types.
Abstract
Yes/No or polar questions represent one of the main linguistic question categories. They consist of a main interrogative clause, for which the answer is binary (assertion or negation). Polar questions and answers (PQA) represent a valuable knowledge resource present in many community and other curated QA sources, such as forums or e-commerce applications. Using answers to polar questions alone in other contexts is not trivial. Answers are contextualized, and presume that the interrogative question clause and any shared knowledge between the asker and answerer are provided. We address the problem of controllable rewriting of answers to polar questions into decontextualized and succinct factual statements. We propose a Transformer sequence to sequence model that utilizes soft-constraints to ensure controllable rewriting, such that the output statement is semantically equivalent to its PQA input. Evaluation on three separate PQA datasets as measured through automated and human evaluation metrics show that our proposed approach achieves the best performance when compared to existing baselines.
