Table of Contents
Fetching ...

Context-Efficient Retrieval with Factual Decomposition

Yanhong Li, David Yunis, David McAllester, Jiawei Zhou

TL;DR

Context-Efficient Retrieval with Factual Decomposition introduces FADER, a retrieval augmentation framework that pre-processes external corpora into atomic entity-description pairs (EDPs) to serve as retrieval units. The method combines question speculation, query-guided factual decomposition, and sampling-based KB augmentation to build a compact, semi-structured knowledge base (EDP KB) for efficient RAG. Across NarrativeQA, Qasper, and QuALITY, FADER achieves higher QA performance under constrained retrieval budgets, demonstrating improved context-efficiency and lower inference costs. By enabling more structured yet expressive internal knowledge representations, FADER offers a practical pathway toward scalable, dynamic knowledge integration in LLM-based systems.

Abstract

There has recently been considerable interest in incorporating information retrieval into large language models (LLMs). Retrieval from a dynamically expanding external corpus of text allows a model to incorporate current events and can be viewed as a form of episodic memory. Here we demonstrate that pre-processing the external corpus into semi-structured ''atomic facts'' makes retrieval more efficient. More specifically, we demonstrate that our particular form of atomic facts improves performance on various question answering tasks when the amount of retrieved text is limited. Limiting the amount of retrieval reduces the size of the context and improves inference efficiency.

Context-Efficient Retrieval with Factual Decomposition

TL;DR

Context-Efficient Retrieval with Factual Decomposition introduces FADER, a retrieval augmentation framework that pre-processes external corpora into atomic entity-description pairs (EDPs) to serve as retrieval units. The method combines question speculation, query-guided factual decomposition, and sampling-based KB augmentation to build a compact, semi-structured knowledge base (EDP KB) for efficient RAG. Across NarrativeQA, Qasper, and QuALITY, FADER achieves higher QA performance under constrained retrieval budgets, demonstrating improved context-efficiency and lower inference costs. By enabling more structured yet expressive internal knowledge representations, FADER offers a practical pathway toward scalable, dynamic knowledge integration in LLM-based systems.

Abstract

There has recently been considerable interest in incorporating information retrieval into large language models (LLMs). Retrieval from a dynamically expanding external corpus of text allows a model to incorporate current events and can be viewed as a form of episodic memory. Here we demonstrate that pre-processing the external corpus into semi-structured ''atomic facts'' makes retrieval more efficient. More specifically, we demonstrate that our particular form of atomic facts improves performance on various question answering tasks when the amount of retrieved text is limited. Limiting the amount of retrieval reduces the size of the context and improves inference efficiency.

Paper Structure

This paper contains 33 sections, 11 figures, 12 tables.

Figures (11)

  • Figure 1: Example of datastore used for knowledge retrieval in our approach compared with typical fixed-size text chunks in RAG. We retrieve much shorter contexts.
  • Figure 2: Overview of the FADER pipeline.
  • Figure 3: Results on NarrativeQA (top three plots), Qasper (bottom right), and quality (bottom left). The x-axis represents the number of tokens fixed in the retrieval context, and y-axis are different QA metrics used for each dataset.
  • Figure 4: Comparison of performance (y-axis) vs. number of retrieved tokens (x-axis) between Fact-Only KB construction and Question-speculated KB construction on a subset of NarrativeQA's validation set.
  • Figure 5: Performance (y-axis) vs. number of retrieved tokens (x-axis) on NarrativeQA for the number of resampled KBs equal to 1, 3, and 5.
  • ...and 6 more figures