Table of Contents
Fetching ...

Beyond Chunking: Discourse-Aware Hierarchical Retrieval for Long Document Question Answering

Huiyao Chen, Yi Yang, Yinghui Li, Meishan Zhang, Min Zhang

TL;DR

...

Abstract

Long document question answering systems typically process texts as flat sequences or use arbitrary segmentation, failing to capture discourse structures that guide human comprehension. We present a discourse-aware hierarchical framework that leverages rhetorical structure theory (RST) to enhance long document question answering. Our approach converts discourse trees into sentence-level representations and employs LLM-enhanced node representations to bridge structural and semantic information. The framework involves three key innovations: specialized discourse parsing for lengthy documents, LLM-based enhancement of discourse relation nodes, and structure-guided hierarchical retrieval. Comprehensive experiments on QASPER, QuALITY, and NarrativeQA demonstrate consistent improvements over existing approaches. Ablation studies confirm that incorporating discourse structure significantly enhances question answering across diverse document types.

Beyond Chunking: Discourse-Aware Hierarchical Retrieval for Long Document Question Answering

TL;DR

...

Abstract

Long document question answering systems typically process texts as flat sequences or use arbitrary segmentation, failing to capture discourse structures that guide human comprehension. We present a discourse-aware hierarchical framework that leverages rhetorical structure theory (RST) to enhance long document question answering. Our approach converts discourse trees into sentence-level representations and employs LLM-enhanced node representations to bridge structural and semantic information. The framework involves three key innovations: specialized discourse parsing for lengthy documents, LLM-based enhancement of discourse relation nodes, and structure-guided hierarchical retrieval. Comprehensive experiments on QASPER, QuALITY, and NarrativeQA demonstrate consistent improvements over existing approaches. Ablation studies confirm that incorporating discourse structure significantly enhances question answering across diverse document types.

Paper Structure

This paper contains 61 sections, 11 equations, 9 figures, 6 tables, 1 algorithm.

Figures (9)

  • Figure 1: Comparison of document modeling approaches for long document QA. Numbers (1-6) show sentence order in original document, with similar colors indicating semantic relationships. Four approaches are compared: (a) Flat sequential modeling, (b) Bottom-up semantic clustering, (c) Bisection-based adjacent grouping, and (d) Our discourse-aware approach that preserves both semantic and discourse structures.
  • Figure 2: Overview of the DISRetrieval framework. The framework consists of three main steps: (1) Discourse-Aware Tree Construction that builds a hierarchical discourse structure through two phases: constructing paragraph-level discourse trees via discourse parsing, and generating document-level tree with LLM-enhanced node representations; (2) Node Representation that integrates the trees and converts text content into dense vector representations via an encoder; (3) Hierarchical Evidence Retrieval and Selection that performs multi-level evidence retrieval and structure-aware selection to identify relevant text segments.
  • Figure 3: Ablation results of different variants.
  • Figure 4: Impact of discourse parser capability on subsequent retrieval and question answering performance. Parsers used for comparative evaluation are trained on varying data scales, ranging from 0% to 100%.
  • Figure 5: Illustration of bottom-up LLM enhancement in Phase 2 of discourse tree construction. Top: Input paragraph with its initial discourse tree structure. Center: Two-way processing strategy based on text length - using direct concatenation ($\bigoplus$) when combined length is below threshold $\tau$, and LLM-based summarization when above threshold. Bottom: Enhanced discourse tree with progressively generated semantic summaries following Equation \ref{['eq:LLM_enhance']}, demonstrating the transformation from rhetorical relations to concrete semantic representations.
  • ...and 4 more figures