Table of Contents
Fetching ...

Multi-Granularity Guided Fusion-in-Decoder

Eunseong Choi, Hyeri Lee, Jongwuk Lee

TL;DR

MGFiD tackles spurious evidence in open-domain QA by jointly learning evidence at coarse (passage) and fine (sentence) granularities and guiding decoding with an anchor vector. It combines passage re-ranking, sentence classification, and threshold-based pruning within a multi-task FiD framework, augmented by pseudo-labels from LLMs. The approach yields significant EM gains on NQ and moderate gains on TQA, while reducing decoder workload by pruning to a small, relevant subset of passages. This multi-granularity strategy improves both accuracy and decoding efficiency, advancing robust evidence discrimination in ODQA with practical deployment implications.

Abstract

In Open-domain Question Answering (ODQA), it is essential to discern relevant contexts as evidence and avoid spurious ones among retrieved results. The model architecture that uses concatenated multiple contexts in the decoding phase, i.e., Fusion-in-Decoder, demonstrates promising performance but generates incorrect outputs from seemingly plausible contexts. To address this problem, we propose the Multi-Granularity guided Fusion-in-Decoder (MGFiD), discerning evidence across multiple levels of granularity. Based on multi-task learning, MGFiD harmonizes passage re-ranking with sentence classification. It aggregates evident sentences into an anchor vector that instructs the decoder. Additionally, it improves decoding efficiency by reusing the results of passage re-ranking for passage pruning. Through our experiments, MGFiD outperforms existing models on the Natural Questions (NQ) and TriviaQA (TQA) datasets, highlighting the benefits of its multi-granularity solution.

Multi-Granularity Guided Fusion-in-Decoder

TL;DR

MGFiD tackles spurious evidence in open-domain QA by jointly learning evidence at coarse (passage) and fine (sentence) granularities and guiding decoding with an anchor vector. It combines passage re-ranking, sentence classification, and threshold-based pruning within a multi-task FiD framework, augmented by pseudo-labels from LLMs. The approach yields significant EM gains on NQ and moderate gains on TQA, while reducing decoder workload by pruning to a small, relevant subset of passages. This multi-granularity strategy improves both accuracy and decoding efficiency, advancing robust evidence discrimination in ODQA with practical deployment implications.

Abstract

In Open-domain Question Answering (ODQA), it is essential to discern relevant contexts as evidence and avoid spurious ones among retrieved results. The model architecture that uses concatenated multiple contexts in the decoding phase, i.e., Fusion-in-Decoder, demonstrates promising performance but generates incorrect outputs from seemingly plausible contexts. To address this problem, we propose the Multi-Granularity guided Fusion-in-Decoder (MGFiD), discerning evidence across multiple levels of granularity. Based on multi-task learning, MGFiD harmonizes passage re-ranking with sentence classification. It aggregates evident sentences into an anchor vector that instructs the decoder. Additionally, it improves decoding efficiency by reusing the results of passage re-ranking for passage pruning. Through our experiments, MGFiD outperforms existing models on the Natural Questions (NQ) and TriviaQA (TQA) datasets, highlighting the benefits of its multi-granularity solution.
Paper Structure (23 sections, 9 equations, 8 figures, 5 tables)

This paper contains 23 sections, 9 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Examples that may harm the QA systems. Black Bold terms in the passages are overlapped with the question. (a) The passage is not supportive while containing a correct answer span. (b) Confusing sentences within the passage mislead model prediction.
  • Figure 2: The MGFiD framework incorporates multi-task learning for answer generation, leveraging passage re-ranking to identify coarse-grained evidence and sentence classification for fine-grained evidence. It utilizes the outcomes of these tasks—threshold-based masking from passage re-ranking and anchor embedding from sentence classification—to enhance both efficiency and effectiveness in the answer generation process.
  • Figure 3: Learning solely from sentences may lead to a lack of understanding of the broader context.
  • Figure 4: A prompting example used for LLMs to filter out contexts that have an answer span but are not evident to the question.
  • Figure 5: (a) The average number of passages provided to the decoder as a function of $\tau$. (b) The effectiveness of varying $\tau$. We utilized the NQ dev dataset and the best checkpoint of MGFiD. When $\tau = 0.05$, MGFiD significantly outperforms using a constant number of 5 re-ranked passages with fewer passages.
  • ...and 3 more figures