Table of Contents
Fetching ...

Enhancing Post-Hoc Attributions in Long Document Comprehension via Coarse Grained Answer Decomposition

Pritika Ramu, Koustava Goswami, Apoorv Saxena, Balaji Vasan Srinivasan

TL;DR

A novel approach to the factual decomposition of generated answers for attribution is proposed, employing template-based in-context learning and integrates negative sampling during few-shot in-context learning for decomposition, enhancing the semantic understanding of both abstractive and extractive answers.

Abstract

Accurately attributing answer text to its source document is crucial for developing a reliable question-answering system. However, attribution for long documents remains largely unexplored. Post-hoc attribution systems are designed to map answer text back to the source document, yet the granularity of this mapping has not been addressed. Furthermore, a critical question arises: What exactly should be attributed? This involves identifying the specific information units within an answer that require grounding. In this paper, we propose and investigate a novel approach to the factual decomposition of generated answers for attribution, employing template-based in-context learning. To accomplish this, we utilize the question and integrate negative sampling during few-shot in-context learning for decomposition. This approach enhances the semantic understanding of both abstractive and extractive answers. We examine the impact of answer decomposition by providing a thorough examination of various attribution approaches, ranging from retrieval-based techniques to LLM-based attributors.

Enhancing Post-Hoc Attributions in Long Document Comprehension via Coarse Grained Answer Decomposition

TL;DR

A novel approach to the factual decomposition of generated answers for attribution is proposed, employing template-based in-context learning and integrates negative sampling during few-shot in-context learning for decomposition, enhancing the semantic understanding of both abstractive and extractive answers.

Abstract

Accurately attributing answer text to its source document is crucial for developing a reliable question-answering system. However, attribution for long documents remains largely unexplored. Post-hoc attribution systems are designed to map answer text back to the source document, yet the granularity of this mapping has not been addressed. Furthermore, a critical question arises: What exactly should be attributed? This involves identifying the specific information units within an answer that require grounding. In this paper, we propose and investigate a novel approach to the factual decomposition of generated answers for attribution, employing template-based in-context learning. To accomplish this, we utilize the question and integrate negative sampling during few-shot in-context learning for decomposition. This approach enhances the semantic understanding of both abstractive and extractive answers. We examine the impact of answer decomposition by providing a thorough examination of various attribution approaches, ranging from retrieval-based techniques to LLM-based attributors.
Paper Structure (33 sections, 1 equation, 5 figures, 10 tables, 1 algorithm)

This paper contains 33 sections, 1 equation, 5 figures, 10 tables, 1 algorithm.

Figures (5)

  • Figure 1: An example from Verifiability dataset. The input to the post-hoc attribution system is the question, document and answer. The output is evidence sentences from the document. Text marked in red do not require attribution.
  • Figure 2: Pipeline for attribution: Answers are decomposed and sent to the attributor for identifying evidences.
  • Figure 3: Average number of decomposition per sentence using each method.
  • Figure 4: Screenshot of Microsoft Forms used for survey.
  • Figure 5: Human Annotation Error