Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering
Shuohang Wang, Mo Yu, Jing Jiang, Wei Zhang, Xiaoxiao Guo, Shiyu Chang, Zhiguo Wang, Tim Klinger, Gerald Tesauro, Murray Campbell
TL;DR
This work targets open-domain QA by showing that cross-passage evidence can be effectively aggregated to improve answer accuracy. It introduces two re-ranking approaches—strength-based and coverage-based—that re-score top-K candidate answers produced by a strong RC model, using evidence consistency and coverage across passages, respectively. The combination of these methods achieves state-of-the-art results on Quasar-T, SearchQA, and TriviaQA (open-domain), with notable gains in F1 and EM, and demonstrates the value of multi-passage reasoning over single-passage baselines. The findings highlight the potential of explicit evidence aggregation for enhancing open-domain QA and suggest avenues for future extensions to more complex multi-passage reasoning tasks.
Abstract
A popular recent approach to answering open-domain questions is to first search for question-related passages and then apply reading comprehension models to extract answers. Existing methods usually extract answers from single passages independently. But some questions require a combination of evidence from across different sources to answer correctly. In this paper, we propose two models which make use of multiple passages to generate their answers. Both use an answer-reranking approach which reorders the answer candidates generated by an existing state-of-the-art QA model. We propose two methods, namely, strength-based re-ranking and coverage-based re-ranking, to make use of the aggregated evidence from different passages to better determine the answer. Our models have achieved state-of-the-art results on three public open-domain QA datasets: Quasar-T, SearchQA and the open-domain version of TriviaQA, with about 8 percentage points of improvement over the former two datasets.
