Table of Contents
Fetching ...

Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering

Shuohang Wang, Mo Yu, Jing Jiang, Wei Zhang, Xiaoxiao Guo, Shiyu Chang, Zhiguo Wang, Tim Klinger, Gerald Tesauro, Murray Campbell

TL;DR

This work targets open-domain QA by showing that cross-passage evidence can be effectively aggregated to improve answer accuracy. It introduces two re-ranking approaches—strength-based and coverage-based—that re-score top-K candidate answers produced by a strong RC model, using evidence consistency and coverage across passages, respectively. The combination of these methods achieves state-of-the-art results on Quasar-T, SearchQA, and TriviaQA (open-domain), with notable gains in F1 and EM, and demonstrates the value of multi-passage reasoning over single-passage baselines. The findings highlight the potential of explicit evidence aggregation for enhancing open-domain QA and suggest avenues for future extensions to more complex multi-passage reasoning tasks.

Abstract

A popular recent approach to answering open-domain questions is to first search for question-related passages and then apply reading comprehension models to extract answers. Existing methods usually extract answers from single passages independently. But some questions require a combination of evidence from across different sources to answer correctly. In this paper, we propose two models which make use of multiple passages to generate their answers. Both use an answer-reranking approach which reorders the answer candidates generated by an existing state-of-the-art QA model. We propose two methods, namely, strength-based re-ranking and coverage-based re-ranking, to make use of the aggregated evidence from different passages to better determine the answer. Our models have achieved state-of-the-art results on three public open-domain QA datasets: Quasar-T, SearchQA and the open-domain version of TriviaQA, with about 8 percentage points of improvement over the former two datasets.

Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering

TL;DR

This work targets open-domain QA by showing that cross-passage evidence can be effectively aggregated to improve answer accuracy. It introduces two re-ranking approaches—strength-based and coverage-based—that re-score top-K candidate answers produced by a strong RC model, using evidence consistency and coverage across passages, respectively. The combination of these methods achieves state-of-the-art results on Quasar-T, SearchQA, and TriviaQA (open-domain), with notable gains in F1 and EM, and demonstrates the value of multi-passage reasoning over single-passage baselines. The findings highlight the potential of explicit evidence aggregation for enhancing open-domain QA and suggest avenues for future extensions to more complex multi-passage reasoning tasks.

Abstract

A popular recent approach to answering open-domain questions is to first search for question-related passages and then apply reading comprehension models to extract answers. Existing methods usually extract answers from single passages independently. But some questions require a combination of evidence from across different sources to answer correctly. In this paper, we propose two models which make use of multiple passages to generate their answers. Both use an answer-reranking approach which reorders the answer candidates generated by an existing state-of-the-art QA model. We propose two methods, namely, strength-based re-ranking and coverage-based re-ranking, to make use of the aggregated evidence from different passages to better determine the answer. Our models have achieved state-of-the-art results on three public open-domain QA datasets: Quasar-T, SearchQA and the open-domain version of TriviaQA, with about 8 percentage points of improvement over the former two datasets.

Paper Structure

This paper contains 29 sections, 7 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Two examples of questions and candidate answers. (a) A question benefiting from the repetition of evidence. Correct answer A2 has multiple passages that could support A2 as answer. The wrong answer A1 has only a single supporting passage. (b) A question benefiting from the union of multiple pieces of evidence to support the answer. The correct answer A2 has evidence passages that can match both the first half and the second half of the question. The wrong answer A1 has evidence passages covering only the first half.
  • Figure 2: An overview of the full re-ranker. It consists of strength-based and coverage-based re-ranking.
  • Figure 3: Performance decomposition according to the length of answers and the question types.