Table of Contents
Fetching ...

Correct after Answer: Enhancing Multi-Span Question Answering with Post-Processing Method

Jiayi Lin, Chenyang Zhang, Haibo Tong, Dongyu Zhang, Qingqing Hong, Bingxuan Hou, Junli Wang

TL;DR

This work proposes Answering-Classifying-Correcting (ACC) framework, which employs a post-processing strategy to handle incorrect predictions and shows that ACC framework significantly improves the Exact Match (EM) scores, and further analysis demostrates that ACC framework efficiently reduces the number of incorrect predictions, improving the quality of predictions.

Abstract

Multi-Span Question Answering (MSQA) requires models to extract one or multiple answer spans from a given context to answer a question. Prior work mainly focuses on designing specific methods or applying heuristic strategies to encourage models to predict more correct predictions. However, these models are trained on gold answers and fail to consider the incorrect predictions. Through a statistical analysis, we observe that models with stronger abilities do not predict less incorrect predictions compared with other models. In this work, we propose Answering-Classifying-Correcting (ACC) framework, which employs a post-processing strategy to handle incorrect predictions. Specifically, the ACC framework first introduces a classifier to classify the predictions into three types and exclude "wrong predictions", then introduces a corrector to modify "partially correct predictions". Experiments on several MSQA datasets show that ACC framework significantly improves the Exact Match (EM) scores, and further analysis demostrates that ACC framework efficiently reduces the number of incorrect predictions, improving the quality of predictions.

Correct after Answer: Enhancing Multi-Span Question Answering with Post-Processing Method

TL;DR

This work proposes Answering-Classifying-Correcting (ACC) framework, which employs a post-processing strategy to handle incorrect predictions and shows that ACC framework significantly improves the Exact Match (EM) scores, and further analysis demostrates that ACC framework efficiently reduces the number of incorrect predictions, improving the quality of predictions.

Abstract

Multi-Span Question Answering (MSQA) requires models to extract one or multiple answer spans from a given context to answer a question. Prior work mainly focuses on designing specific methods or applying heuristic strategies to encourage models to predict more correct predictions. However, these models are trained on gold answers and fail to consider the incorrect predictions. Through a statistical analysis, we observe that models with stronger abilities do not predict less incorrect predictions compared with other models. In this work, we propose Answering-Classifying-Correcting (ACC) framework, which employs a post-processing strategy to handle incorrect predictions. Specifically, the ACC framework first introduces a classifier to classify the predictions into three types and exclude "wrong predictions", then introduces a corrector to modify "partially correct predictions". Experiments on several MSQA datasets show that ACC framework significantly improves the Exact Match (EM) scores, and further analysis demostrates that ACC framework efficiently reduces the number of incorrect predictions, improving the quality of predictions.

Paper Structure

This paper contains 56 sections, 10 equations, 6 figures, 14 tables.

Figures (6)

  • Figure 1: An example of MSQA. This question has two gold answers: "Becky Sloan" and "Joseph Pelling". "Joseph Pelling" is a correct prediction, "Sloan" is a partially correct prediction and "DHMIS" is a wrong prediction. Best read in colors.
  • Figure 2: The distribution of correct predictions, partially correct predictions and wrong predictions on the validation set of MultiSpanQA. The validation set of MultiSpanQA contains 653 questions with 1,911 gold answers.
  • Figure 3: The overall architecture of our proposed ACC framework.
  • Figure 4: Top: Average Word Overlap of the predictions. Button: Average BERTScore of the predictions. After applying the ACC framework, both Word Overlap and BERTScore raise, indicating that the ACC framework effectively enhances the quality of the predictions.
  • Figure 5: Case study. The example are selected from the validation set of MultiSpanQA. The correct predictions and gold answers are in green, the partially correct predictions are in blue and the wrong predictions are in red. Best read in colors.
  • ...and 1 more figures