Towards Robust Extractive Question Answering Models: Rethinking the Training Methodology

Son Quoc Tran; Matt Kretchmar

Towards Robust Extractive Question Answering Models: Rethinking the Training Methodology

Son Quoc Tran, Matt Kretchmar

TL;DR

This paper proposes a novel training method to improve the robustness of Extractive Question Answering models, which includes a novel loss function for the EQA problem and challenges an implicit assumption present in numerous EQA datasets.

Abstract

This paper proposes a novel training method to improve the robustness of Extractive Question Answering (EQA) models. Previous research has shown that existing models, when trained on EQA datasets that include unanswerable questions, demonstrate a significant lack of robustness against distribution shifts and adversarial attacks. Despite this, the inclusion of unanswerable questions in EQA training datasets is essential for ensuring real-world reliability. Our proposed training method includes a novel loss function for the EQA problem and challenges an implicit assumption present in numerous EQA datasets. Models trained with our method maintain in-domain performance while achieving a notable improvement on out-of-domain datasets. This results in an overall F1 score improvement of 5.7 across all testing sets. Furthermore, our models exhibit significantly enhanced robustness against two types of adversarial attacks, with a performance decrease of only about a third compared to the default models.

Towards Robust Extractive Question Answering Models: Rethinking the Training Methodology

TL;DR

Abstract

Paper Structure (29 sections, 13 equations, 1 figure, 9 tables)

This paper contains 29 sections, 13 equations, 1 figure, 9 tables.

Introduction
Related Work
Models and Tasks
Models
Extractive Question Answering
Datasets
Adversarial Attacks
Robustness Evaluation
Algorithms for Attack Construction
AddOneSent Attacks
Negation Attacks
Extractive Question Answering Loss Functions
Default Loss Function
Our Loss Function
Inference Pipeline
...and 14 more sections

Figures (1)

Figure 1: The training dynamics of RoBERTa models trained using the Devlin method versus our proposed method on SQuAD 2.0. We analyze the performance gap on unanswerable questions between SQuAD 2.0 and SQuAD AGent across three training epochs. The error bars represent the standard deviations of five runs.

Towards Robust Extractive Question Answering Models: Rethinking the Training Methodology

TL;DR

Abstract

Towards Robust Extractive Question Answering Models: Rethinking the Training Methodology

Authors

TL;DR

Abstract

Table of Contents

Figures (1)