Table of Contents
Fetching ...

Equipping Retrieval-Augmented Large Language Models with Document Structure Awareness

Lingnan Xu, Chong Feng, Kaiyuan Zhang, Liu Zhengyong, Wenqiang Xu, Fanqing Meng

TL;DR

This work proposes Retrieve-DocumentRoute-Read (RDR2), a novel framework that explicitly incorporates structural information throughout the RAG process, demonstrating that explicit structural awareness significantly enhances RAG systems'ability to acquire and utilize knowledge, particularly in complex scenarios requiring multi-document synthesis.

Abstract

While large language models (LLMs) demonstrate impressive capabilities, their reliance on parametric knowledge often leads to factual inaccuracies. Retrieval-Augmented Generation (RAG) mitigates this by leveraging external documents, yet existing approaches treat retrieved passages as isolated chunks, ignoring valuable structure that is crucial for document organization. Motivated by this gap, we propose Retrieve-DocumentRoute-Read (RDR2), a novel framework that explicitly incorporates structural information throughout the RAG process. RDR2 employs an LLM-based router to dynamically navigate document structure trees, jointly evaluating content relevance and hierarchical relationships to assemble optimal evidence. Our key innovation lies in formulating document routing as a trainable task, with automatic action curation and structure-aware passage selection inspired by human reading strategies. Through comprehensive evaluation on five challenging datasets, RDR2 achieves state-of-the-art performance, demonstrating that explicit structural awareness significantly enhances RAG systems' ability to acquire and utilize knowledge, particularly in complex scenarios requiring multi-document synthesis.

Equipping Retrieval-Augmented Large Language Models with Document Structure Awareness

TL;DR

This work proposes Retrieve-DocumentRoute-Read (RDR2), a novel framework that explicitly incorporates structural information throughout the RAG process, demonstrating that explicit structural awareness significantly enhances RAG systems'ability to acquire and utilize knowledge, particularly in complex scenarios requiring multi-document synthesis.

Abstract

While large language models (LLMs) demonstrate impressive capabilities, their reliance on parametric knowledge often leads to factual inaccuracies. Retrieval-Augmented Generation (RAG) mitigates this by leveraging external documents, yet existing approaches treat retrieved passages as isolated chunks, ignoring valuable structure that is crucial for document organization. Motivated by this gap, we propose Retrieve-DocumentRoute-Read (RDR2), a novel framework that explicitly incorporates structural information throughout the RAG process. RDR2 employs an LLM-based router to dynamically navigate document structure trees, jointly evaluating content relevance and hierarchical relationships to assemble optimal evidence. Our key innovation lies in formulating document routing as a trainable task, with automatic action curation and structure-aware passage selection inspired by human reading strategies. Through comprehensive evaluation on five challenging datasets, RDR2 achieves state-of-the-art performance, demonstrating that explicit structural awareness significantly enhances RAG systems' ability to acquire and utilize knowledge, particularly in complex scenarios requiring multi-document synthesis.

Paper Structure

This paper contains 32 sections, 10 equations, 8 figures, 16 tables, 1 algorithm.

Figures (8)

  • Figure 1: Performance comparison on ASQA, where RDR2 achieves the highest Exact Match (EM) score while generating the most concise responses. Readers are based on either Llama-2-13B or ChatGPT (*).
  • Figure 2: Overwiew of the RDR2 framework. RDR2 extends standard Retrieve-and-Read with document-structure-aware routing for iterative, fine-grained knowledge retrieval. Retrieve: input question $q$, output retrieved chunks $C_{re}$; Document Route: input $q$, $C_{re}$ and corresponding documents $D$, output routed chunks $C_{ro}$; Read: input $q$ and $C_{ro}$, output final answer $a$.
  • Figure 3: Workflow of the routing module. Given a user input $q$ and a document structure tree (Section \ref{['sec:dst']}) anchored by retrieved chunks, RDR2 maintains a retrieval subtree $s$ where: (i) all structure nodes persist, (ii) only content nodes under currently selected headings are expanded (previous fold). At step $t$, the router generates action $\{\langle a_{j}^{(t)}, p_{j}^{(t)} \rangle\}_{j=1}^{n_t} = \mathrm{Router}(q, s_t)$ to: (a) select useful content nodes, (b) unfold a promising structure node, or (c) stops routing.
  • Figure 4: Comparison between RDR2 and baselines across all datasets with different readers. We report the primary correctness metric for each dataset: Exact Match for TriviaQA, HotpotQA and ASQA, F1-5 for QAMPARI and Claim Recall for ELI5.
  • Figure 5: Scaling test-time compute on ASQA for RDR2 framework. Left: top-$k$ scaling. Right: expand-$iter$ scaling. Exact Match (EM) is reported from both passage/answer-aspect.
  • ...and 3 more figures