Consistency Training by Synthetic Question Generation for Conversational Question Answering

Hamed Hematian Hemati; Hamid Beigy

Consistency Training by Synthetic Question Generation for Conversational Question Answering

Hamed Hematian Hemati, Hamid Beigy

TL;DR

CoTaH addresses the challenge of noisy historical context in conversational QA by augmenting history with synthetic questions and enforcing consistency between predictions made with original history and augmented history. The method is model-agnostic and comprises a History Augmentation Module (CQG_{ heta} and QS) and a QA Module that minimizes L_T = L_{CE} + \lambda L_{Cons}, where $L_{Cons} = D_{KL}(QA_{ heta'}(q_k,H_k,D) \| QA_{ heta'}(q_k,H_k^{\bar{}} ,D))$. Evaluations on QuAC show a +1.8 F1 gain over a strong baseline, with pronounced improvements on questions featuring substantial historical context, demonstrating robust reasoning to history noise. Overall, the approach offers a practical, efficient path to more robust conversational QA by decoupling history augmentation from inference-time computation while maintaining model-agnostic applicability.

Abstract

Efficiently modeling historical information is a critical component in addressing user queries within a conversational question-answering (QA) context, as historical context plays a vital role in clarifying the user's questions. However, irrelevant history induces noise in the reasoning process, especially for those questions with a considerable historical context. In our novel model-agnostic approach, referred to as CoTaH (Consistency-Trained augmented History), we augment the historical information with synthetic questions and subsequently employ consistency training to train a model that utilizes both real and augmented historical data to implicitly make the reasoning robust to irrelevant history. To the best of our knowledge, this is the first instance of research using question generation as a form of data augmentation to model conversational QA settings. By citing a common modeling error prevalent in previous research, we introduce a new baseline model and compare our model's performance against it, demonstrating an improvement in results, particularly when dealing with questions that include a substantial amount of historical context. The source code can be found on our GitHub page.

Consistency Training by Synthetic Question Generation for Conversational Question Answering

TL;DR

. Evaluations on QuAC show a +1.8 F1 gain over a strong baseline, with pronounced improvements on questions featuring substantial historical context, demonstrating robust reasoning to history noise. Overall, the approach offers a practical, efficient path to more robust conversational QA by decoupling history augmentation from inference-time computation while maintaining model-agnostic applicability.

Abstract

Paper Structure (22 sections, 3 equations, 3 figures, 6 tables)

This paper contains 22 sections, 3 equations, 3 figures, 6 tables.

Introduction
Related Works
Problem Definition
Methodology
History Augmentation Module
Training
Question Generation
Question Filtering & Injection
Question Answering Module
Setup
Results
Conclusions
Limitations
Appendix
Data Splitting
...and 7 more sections

Figures (3)

Figure 1: Architecture of the Model: For a given question $q_k$, the conversational question generator $CQG_\theta$ constructs a pool of questions denoted as $P_k$. questions in $H_k$ are shown in blue, and synthetic questions are depicted in green and red. The synthetic questions, which are similar to $H_k$ questions, are marked in red, while dissimilar ones are in green. The question selector $QS$ discards red synthetic questions, selects $M$ ones with the highest scores, and chooses $S=3$ synthetic questions from the green questions according to uniform distribution, along with $H_k$ questions, to create $H_k^{\star}$. The QA network $QA_{\theta'}$ computes its output using both $H_k$ and $H_k^{\star}$ as input. The QA network is trained by minimizing $L_{CE}$ and $L_{Cons}$.
Figure 2: The F1 score of the test set dialog turns
Figure 3: A comparison between Bert and CoTaH-Bert extracted answers to a question, showing that CoTaH-Bert has been able to successfully ignore the irrelevant history by extracting the correct answer. However, the Bert model has been confused and returned a wrong answer.

Consistency Training by Synthetic Question Generation for Conversational Question Answering

TL;DR

Abstract

Consistency Training by Synthetic Question Generation for Conversational Question Answering

Authors

TL;DR

Abstract

Table of Contents

Figures (3)