Table of Contents
Fetching ...

Improving QA Model Performance with Cartographic Inoculation

Allen Chen, Okan Tanrikulu

TL;DR

QA models can overfit to dataset artifacts, limiting real-world generalization. The authors present cartographic inoculation, which uses dataset cartography to identify ambiguous adversarial examples and selectively fine-tune models on them to reduce artifact reliance. This approach achieves faster performance gains, better out-of-domain accuracy, and less overfitting than standard inoculation, demonstrated across SQuAD, Adversarial SQuAD, Randomized Adversarial SQuAD, and TriviaQA. The work offers a practical, data-map-guided strategy to make QA systems more robust to artifacts in benchmark datasets and more applicable to real-world tasks.

Abstract

QA models are faced with complex and open-ended contextual reasoning problems, but can often learn well-performing solution heuristics by exploiting dataset-specific patterns in their training data. These patterns, or "dataset artifacts", reduce the model's ability to generalize to real-world QA problems. Utilizing an ElectraSmallDiscriminator model trained for QA, we analyze the impacts and incidence of dataset artifacts using an adversarial challenge set designed to confuse models reliant on artifacts for prediction. Extending existing work on methods for mitigating artifact impacts, we propose cartographic inoculation, a novel method that fine-tunes models on an optimized subset of the challenge data to reduce model reliance on dataset artifacts. We show that by selectively fine-tuning a model on ambiguous adversarial examples from a challenge set, significant performance improvements can be made on the full challenge dataset with minimal loss of model generalizability to other challenging environments and QA datasets.

Improving QA Model Performance with Cartographic Inoculation

TL;DR

QA models can overfit to dataset artifacts, limiting real-world generalization. The authors present cartographic inoculation, which uses dataset cartography to identify ambiguous adversarial examples and selectively fine-tune models on them to reduce artifact reliance. This approach achieves faster performance gains, better out-of-domain accuracy, and less overfitting than standard inoculation, demonstrated across SQuAD, Adversarial SQuAD, Randomized Adversarial SQuAD, and TriviaQA. The work offers a practical, data-map-guided strategy to make QA systems more robust to artifacts in benchmark datasets and more applicable to real-world tasks.

Abstract

QA models are faced with complex and open-ended contextual reasoning problems, but can often learn well-performing solution heuristics by exploiting dataset-specific patterns in their training data. These patterns, or "dataset artifacts", reduce the model's ability to generalize to real-world QA problems. Utilizing an ElectraSmallDiscriminator model trained for QA, we analyze the impacts and incidence of dataset artifacts using an adversarial challenge set designed to confuse models reliant on artifacts for prediction. Extending existing work on methods for mitigating artifact impacts, we propose cartographic inoculation, a novel method that fine-tunes models on an optimized subset of the challenge data to reduce model reliance on dataset artifacts. We show that by selectively fine-tuning a model on ambiguous adversarial examples from a challenge set, significant performance improvements can be made on the full challenge dataset with minimal loss of model generalizability to other challenging environments and QA datasets.
Paper Structure (24 sections, 5 figures, 5 tables)

This paper contains 24 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Visualization depicting the inoculation by fine-tuning method and potential outcomes, figure adapted from liu2019inoculation
  • Figure 2: Example of a model error on Adversarial SQuAD, figure adapted from jia2017adversarial
  • Figure 3: Data map for the Adversarial SQuAD challenge set. Note the high F1 variance (indicating ambiguity) on many distracting examples.
  • Figure 4: Example of an ambiguous example as identified by dataset cartography. Note the complexity of the context passage and open-ended nature of the question.
  • Figure 5: Results of inoculation. Plotting EM scores instead of F1 scores produces similar trends.