Table of Contents
Fetching ...

Efficient and Robust Question Answering from Minimal Context over Documents

Sewon Min, Victor Zhong, Richard Socher, Caiming Xiong

TL;DR

The paper tackles scalable QA over large document collections and robustness to adversarial inputs by showing that most questions can be answered from a small set of sentences. It introduces a dynamic sentence selector that picks a minimal, question-specific context and feeds it to a competitive QA model (DCN+ or S-Reader), achieving substantial training ($up to ${15}\times$) and inference ($up to $13\times$) speedups with accuracy comparable to or better than full-document QA. Across SQuAD, NewsQA, TriviaQA, and SQuAD-Open (and adversarial variants), the approach often reaches or surpasses state-of-the-art performance and exhibits improved robustness to adversarial perturbations. The method combines a shared encoder, attention-based sentence scoring, and three training techniques (weight transfer, data modification, score normalization) to enable per-question dynamic sentence selection, offering a practical and scalable solution for real-world QA over large corpora.

Abstract

Neural models for question answering (QA) over documents have achieved significant performance improvements. Although effective, these models do not scale to large corpora due to their complex modeling of interactions between the document and the question. Moreover, recent work has shown that such models are sensitive to adversarial inputs. In this paper, we study the minimal context required to answer the question, and find that most questions in existing datasets can be answered with a small set of sentences. Inspired by this observation, we propose a simple sentence selector to select the minimal set of sentences to feed into the QA model. Our overall system achieves significant reductions in training (up to 15 times) and inference times (up to 13 times), with accuracy comparable to or better than the state-of-the-art on SQuAD, NewsQA, TriviaQA and SQuAD-Open. Furthermore, our experimental results and analyses show that our approach is more robust to adversarial inputs.

Efficient and Robust Question Answering from Minimal Context over Documents

TL;DR

The paper tackles scalable QA over large document collections and robustness to adversarial inputs by showing that most questions can be answered from a small set of sentences. It introduces a dynamic sentence selector that picks a minimal, question-specific context and feeds it to a competitive QA model (DCN+ or S-Reader), achieving substantial training ({15}\timesup to ) speedups with accuracy comparable to or better than full-document QA. Across SQuAD, NewsQA, TriviaQA, and SQuAD-Open (and adversarial variants), the approach often reaches or surpasses state-of-the-art performance and exhibits improved robustness to adversarial perturbations. The method combines a shared encoder, attention-based sentence scoring, and three training techniques (weight transfer, data modification, score normalization) to enable per-question dynamic sentence selection, offering a practical and scalable solution for real-world QA over large corpora.

Abstract

Neural models for question answering (QA) over documents have achieved significant performance improvements. Although effective, these models do not scale to large corpora due to their complex modeling of interactions between the document and the question. Moreover, recent work has shown that such models are sensitive to adversarial inputs. In this paper, we study the minimal context required to answer the question, and find that most questions in existing datasets can be answered with a small set of sentences. Inspired by this observation, we propose a simple sentence selector to select the minimal set of sentences to feed into the QA model. Our overall system achieves significant reductions in training (up to 15 times) and inference times (up to 13 times), with accuracy comparable to or better than the state-of-the-art on SQuAD, NewsQA, TriviaQA and SQuAD-Open. Furthermore, our experimental results and analyses show that our approach is more robust to adversarial inputs.

Paper Structure

This paper contains 33 sections, 6 equations, 5 figures, 15 tables.

Figures (5)

  • Figure 1: Venn diagram of the questions answered correctly (on exact match (EM)) by the model given a full document (Full) and the model given an oracle sentence (Oracle) on SQuAD (left) and NewsQA (right).
  • Figure 2: Our model architecture. (a) Overall pipeline, consisting of sentence selector and QA model. Selection score of each sentence is obtained in parallel, then sentences with selection score above the threshold are merged and fed into QA model. (b) Shared encoder of sentence selector and S-Reader (QA Model), which takes document and the question as inputs and outputs the document encodings $D^{enc}$ and question encodings $Q^{enc}$. (c) Decoder of S-Reader (QA Model), which takes $D^{enc}$ and $Q^{enc}$ as inputs and outputs the scores for start and end positions. (d) Decoder of sentence selector, which takes $D^{enc}$ and $Q^{enc}$ for each sentence and outputs the score indicating if the question is answerable given the sentence.
  • Figure 3: The distributions of number of sentences that our selector selects using Dyn method on the dev set of SQuAD (left) and NewsQA (right).
  • Figure 4: (Top) The trade-off between the number of selected sentence and accuracy on SQuAD and NewsQA. Dyn outperforms Top k in accuracy with similar number of sentences. (Bottom) Number of selected sentences depending on threshold.
  • Figure 5: (Left) Venn diagram of the questions answered correctly by Full and with Minimal. (Middle and Right) Error cases from Full (Middle) and Minimal (Right), broken down by which sentence the model's prediction comes from.