Table of Contents
Fetching ...

Chronological Passage Assembling in RAG framework for Temporal Question Answering

Byeongjeong Kim, Jeonghyun Park, Joonho Yang, Hwanhee Lee

TL;DR

ChronoRAG tackles narrative QA where temporal ordering is crucial by building a two-layer offline graph that encodes events and their relations, then performing online hierarchical retrieval with neighborhood assembling to produce temporally coherent context. The offline stage chunking the document into fixed-length units ($k$), clustering $l$ chunks for summarization, and extracting relations to form Layer 1 enables a compact, relational graph; the online stage retrieves Layer 1 relations and their adjacent Layer 0 chunks to preserve narrative flow. Across NarrativeQA and GutenQA, ChronoRAG yields strong improvements, particularly on Time Questions, while maintaining efficiency with lightweight graph construction and a single-generation pass for answers. The work highlights that explicitly modeling event-to-event relations and temporal order, rather than merely extracting entities or summarizing text, is key to robust narrative question answering, with practical implications for long-form comprehension tasks and temporal reasoning in AI systems. $k$ and $l$ denote chunk size and cluster size in the offline graph, respectively, and the approach demonstrates that temporal coherence can be achieved with a principled two-stage retrieval framework."

Abstract

Long-context question answering over narrative tasks is challenging because correct answers often hinge on reconstructing a coherent timeline of events while preserving contextual f low in a limited context window. Retrievalaugmented generation (RAG) methods aim to address this challenge by selectively retrieving only necessary document segments. However, narrative texts possess unique characteristics that limit the effectiveness of these existing approaches. Specifically, understanding narrative texts requires more than isolated segments, as the broader context and sequential relationships between segments are crucial for comprehension. To address these limitations, we propose ChronoRAG, a novel RAG framework specialized for narrative texts. This approach focuses on two essential aspects: refining dispersed document information into coherent and structured passages and preserving narrative flow by explicitly capturing and maintaining the temporal order among retrieved passages. We empirically demonstrate the effectiveness of ChronoRAG through experiments on the NarrativeQA and GutenQAdataset, showing substantial improvements in tasks requiring both factual identification and comprehension of complex sequential relationships, underscoring that reasoning over temporal order is crucial in resolving narrative QA.

Chronological Passage Assembling in RAG framework for Temporal Question Answering

TL;DR

ChronoRAG tackles narrative QA where temporal ordering is crucial by building a two-layer offline graph that encodes events and their relations, then performing online hierarchical retrieval with neighborhood assembling to produce temporally coherent context. The offline stage chunking the document into fixed-length units (), clustering chunks for summarization, and extracting relations to form Layer 1 enables a compact, relational graph; the online stage retrieves Layer 1 relations and their adjacent Layer 0 chunks to preserve narrative flow. Across NarrativeQA and GutenQA, ChronoRAG yields strong improvements, particularly on Time Questions, while maintaining efficiency with lightweight graph construction and a single-generation pass for answers. The work highlights that explicitly modeling event-to-event relations and temporal order, rather than merely extracting entities or summarizing text, is key to robust narrative question answering, with practical implications for long-form comprehension tasks and temporal reasoning in AI systems. and denote chunk size and cluster size in the offline graph, respectively, and the approach demonstrates that temporal coherence can be achieved with a principled two-stage retrieval framework."

Abstract

Long-context question answering over narrative tasks is challenging because correct answers often hinge on reconstructing a coherent timeline of events while preserving contextual f low in a limited context window. Retrievalaugmented generation (RAG) methods aim to address this challenge by selectively retrieving only necessary document segments. However, narrative texts possess unique characteristics that limit the effectiveness of these existing approaches. Specifically, understanding narrative texts requires more than isolated segments, as the broader context and sequential relationships between segments are crucial for comprehension. To address these limitations, we propose ChronoRAG, a novel RAG framework specialized for narrative texts. This approach focuses on two essential aspects: refining dispersed document information into coherent and structured passages and preserving narrative flow by explicitly capturing and maintaining the temporal order among retrieved passages. We empirically demonstrate the effectiveness of ChronoRAG through experiments on the NarrativeQA and GutenQAdataset, showing substantial improvements in tasks requiring both factual identification and comprehension of complex sequential relationships, underscoring that reasoning over temporal order is crucial in resolving narrative QA.

Paper Structure

This paper contains 36 sections, 3 equations, 4 figures, 9 tables.

Figures (4)

  • Figure 1: Retrieval comparison for a narrative query. (a) Fine-grained indexing returns six standalone sentences, leaving key clues detached. (b) Our chronological assembling retrieves passages that include their immediate chronological context, preserving the narrative flow. Boxes indicate the directly retrieved sentences.
  • Figure 2: The offline Graph Construction pipeline of ChronoRAG. This process transforms an unstructured narrative document into a structured, two-layer graph that explicitly encodes chronological relationships.
  • Figure 3: The online Passage Retrieval process of ChronoRAG for a sample query. This demonstrates how the pre-constructed graph is used at inference time to assemble a chronologically coherent context for the LLM.
  • Figure 4: A qualitative comparison of retrieved context and model answers for the query, "Where is George Darrow residing when he prepares to join Anna Leath in France?".