Table of Contents
Fetching ...

SmartChunk Retrieval: Query-Aware Chunk Compression with Planning for Efficient Document RAG

Xuechen Zhang, Koustava Goswami, Samet Oymak, Jiasi Chen, Nedim Lipka

TL;DR

SmartChunk retrieval is presented, a query-adaptive framework for efficient and robust long-document question answering (QA) that uses a planner that predicts the optimal chunk abstraction level for each query, and a lightweight compression module that produces high-level chunk embeddings without repeated summarization.

Abstract

Retrieval-augmented generation (RAG) has strong potential for producing accurate and factual outputs by combining language models (LMs) with evidence retrieved from large text corpora. However, current pipelines are limited by static chunking and flat retrieval: documents are split into short, predetermined, fixed-size chunks, embeddings are retrieved uniformly, and generation relies on whatever chunks are returned. This design brings challenges, as retrieval quality is highly sensitive to chunk size, often introduces noise from irrelevant or misleading chunks, and scales poorly to large corpora. We present SmartChunk retrieval, a query-adaptive framework for efficient and robust long-document question answering (QA). SmartChunk uses (i) a planner that predicts the optimal chunk abstraction level for each query, and (ii) a lightweight compression module that produces high-level chunk embeddings without repeated summarization. By adapting retrieval granularity on the fly, SmartChunk balances accuracy with efficiency and avoids the drawbacks of fixed strategies. Notably, our planner can reason about chunk abstractions through a novel reinforcement learning scheme, STITCH, which boosts accuracy and generalization. To reflect real-world applications, where users face diverse document types and query styles, we evaluate SmartChunk on five QA benchmarks plus one out-of-domain dataset. Across these evaluations, SmartChunk outperforms state-of-the-art RAG baselines, while reducing cost. Further analysis demonstrates strong scalability with larger corpora and consistent gains on out-of-domain datasets, highlighting its effectiveness as a general framework for adaptive retrieval.

SmartChunk Retrieval: Query-Aware Chunk Compression with Planning for Efficient Document RAG

TL;DR

SmartChunk retrieval is presented, a query-adaptive framework for efficient and robust long-document question answering (QA) that uses a planner that predicts the optimal chunk abstraction level for each query, and a lightweight compression module that produces high-level chunk embeddings without repeated summarization.

Abstract

Retrieval-augmented generation (RAG) has strong potential for producing accurate and factual outputs by combining language models (LMs) with evidence retrieved from large text corpora. However, current pipelines are limited by static chunking and flat retrieval: documents are split into short, predetermined, fixed-size chunks, embeddings are retrieved uniformly, and generation relies on whatever chunks are returned. This design brings challenges, as retrieval quality is highly sensitive to chunk size, often introduces noise from irrelevant or misleading chunks, and scales poorly to large corpora. We present SmartChunk retrieval, a query-adaptive framework for efficient and robust long-document question answering (QA). SmartChunk uses (i) a planner that predicts the optimal chunk abstraction level for each query, and (ii) a lightweight compression module that produces high-level chunk embeddings without repeated summarization. By adapting retrieval granularity on the fly, SmartChunk balances accuracy with efficiency and avoids the drawbacks of fixed strategies. Notably, our planner can reason about chunk abstractions through a novel reinforcement learning scheme, STITCH, which boosts accuracy and generalization. To reflect real-world applications, where users face diverse document types and query styles, we evaluate SmartChunk on five QA benchmarks plus one out-of-domain dataset. Across these evaluations, SmartChunk outperforms state-of-the-art RAG baselines, while reducing cost. Further analysis demonstrates strong scalability with larger corpora and consistent gains on out-of-domain datasets, highlighting its effectiveness as a general framework for adaptive retrieval.
Paper Structure (23 sections, 6 equations, 10 figures, 9 tables, 2 algorithms)

This paper contains 23 sections, 6 equations, 10 figures, 9 tables, 2 algorithms.

Figures (10)

  • Figure 1: QA accuracy vs. Monetary cost across methods. SmartChunk achieves higher accuracy with lower cost compared to state-of-the-art baselines.
  • Figure 2: Left: Overview of the SmartChunk framework. Compared to vanilla RAG, which uses fixed chunking and flat retrieval, SmartChunk introduces two key modules: (1) a planner $\mathcal{P}$ that predicts the smallest and largest chunk sizes per query, enabling adaptive multi-level retrieval, and (2) a Chunk Compression Encoder $\mathcal{E}$ that produces compact, high-level embeddings for aggregated chunks, lowering the cost of the multi-level representation. These additions allow SmartChunk to adapt to different query complexity and document structure, balancing accuracy and efficiency. Modules added by SmartChunk are shown in blue, while modules from vanilla RAG are shown in black. The figure distinguishes between text (represented by blocks with horizontal lines) and embeddings (shown as solid-colored blocks). Right: The STITCH method for Planner training.
  • Figure 3: Total cost including training and test-time.
  • Figure 4: (a) Performance gaps of SmartChunk over competing methods on four benchmarks—NarrativeQA (ROUGE), QASPER (F1), QuALITY (Accuracy), and Natural Questions (F1); positive bars mean SmartChunk outperforms the baseline. (b) Average chunk sizes (tokens) selected by our planner across datasets, illustrating dataset-/query-adaptive behavior.
  • Figure 5: Distribution of Min–Max Chunk Sizes Across Datasets. Higher chunk levels correspond to larger chunk sizes (ranging from sentence-level to document-level).
  • ...and 5 more figures