Table of Contents
Fetching ...

Evidence-Enhanced Triplet Generation Framework for Hallucination Alleviation in Generative Question Answering

Haowei Du, Huishuai Zhang, Dongyan Zhao

TL;DR

The paper tackles hallucination in document-based GQA by introducing EATQA, a unified triplet-generation framework that trains an LLM to learn and exploit the logical relations among Question, Evidence, and Answer via three subtasks (QA->E, QE->A, EA->Q). It adds a distribution-bridging term to align training with inference and distill evidence-derived reasoning, enabling faithful answer generation without relying on external retrievers. Evaluations on MultiRC and QASPER with LLama2 backbones (7B–13B) show new state-of-the-art performance and robust handling of long documents, with analyses confirming stronger evidence generation, reduced hallucinations, and positive coupling among subtasks. The approach preserves the model’s prior knowledge while enhancing faithfulness, suggesting strong practical impact for reliable document-grounded QA in diverse domains.

Abstract

To address the hallucination in generative question answering (GQA) where the answer can not be derived from the document, we propose a novel evidence-enhanced triplet generation framework, EATQA, encouraging the model to predict all the combinations of (Question, Evidence, Answer) triplet by flipping the source pair and the target label to understand their logical relationships, i.e., predict Answer(A), Question(Q), and Evidence(E) given a QE, EA, and QA pairs, respectively. Furthermore, we bridge the distribution gap to distill the knowledge from evidence in inference stage. Our framework ensures the model to learn the logical relation between query, evidence and answer, which simultaneously improves the evidence generation and query answering. In this paper, we apply EATQA to LLama and it outperforms other LLMs-based methods and hallucination mitigation approaches on two challenging GQA benchmarks. Further analysis shows that our method not only keeps prior knowledge within LLM, but also mitigates hallucination and generates faithful answers.

Evidence-Enhanced Triplet Generation Framework for Hallucination Alleviation in Generative Question Answering

TL;DR

The paper tackles hallucination in document-based GQA by introducing EATQA, a unified triplet-generation framework that trains an LLM to learn and exploit the logical relations among Question, Evidence, and Answer via three subtasks (QA->E, QE->A, EA->Q). It adds a distribution-bridging term to align training with inference and distill evidence-derived reasoning, enabling faithful answer generation without relying on external retrievers. Evaluations on MultiRC and QASPER with LLama2 backbones (7B–13B) show new state-of-the-art performance and robust handling of long documents, with analyses confirming stronger evidence generation, reduced hallucinations, and positive coupling among subtasks. The approach preserves the model’s prior knowledge while enhancing faithfulness, suggesting strong practical impact for reliable document-grounded QA in diverse domains.

Abstract

To address the hallucination in generative question answering (GQA) where the answer can not be derived from the document, we propose a novel evidence-enhanced triplet generation framework, EATQA, encouraging the model to predict all the combinations of (Question, Evidence, Answer) triplet by flipping the source pair and the target label to understand their logical relationships, i.e., predict Answer(A), Question(Q), and Evidence(E) given a QE, EA, and QA pairs, respectively. Furthermore, we bridge the distribution gap to distill the knowledge from evidence in inference stage. Our framework ensures the model to learn the logical relation between query, evidence and answer, which simultaneously improves the evidence generation and query answering. In this paper, we apply EATQA to LLama and it outperforms other LLMs-based methods and hallucination mitigation approaches on two challenging GQA benchmarks. Further analysis shows that our method not only keeps prior knowledge within LLM, but also mitigates hallucination and generates faithful answers.
Paper Structure (26 sections, 9 equations, 5 figures, 7 tables)

This paper contains 26 sections, 9 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: One example from MultiRC dataset. Red denotes supporting evidence and green denotes misleading sentences.
  • Figure 2: Model overview of EATQA.
  • Figure 3: Performance relevance between 3 modules in our method with 13B backbone. QEA denotes evidence-aware question answering, EAQ denotes evidence-grounded query restoration and QAE denotes answer-aware evidence retrieval.
  • Figure 4: Attention weights about different layers with 13B backbone. The left graph denotes the attention weights of query to document and evidence in Evidence-Enhanced Question Answering stage; the right denotes the attention weights of generated query to evidence and answer in Evidence-Aware Question Restoration stage.
  • Figure 5: Input templates of EATQA.