Table of Contents
Fetching ...

Rethinking Retrieval-Augmentation as Synthesis: A Query-Aware Context Merging Approach

Jiarui Guo, Yuemeng Xu, Zongwei Lv, Yangyujia Wang, Xiaolin Wang, Kan Liu, Tao Lan, Lin Qu, Tong Yang

Abstract

Retrieval-Augmented Generation (RAG) enables Large Language Models (LLMs) to extend their existing knowledge by dynamically incorporating external information. However, practical deployment is fundamentally constrained by the LLM's finite context window, forcing a trade-off between information sufficiency and token consumption. Standard pipelines address this via a retrieve-then-select strategy, typically retaining only the top-k chunks based on relevance. Nevertheless, this approach is suboptimal: it inherently truncates critical bridging evidence located in the long tail of the relevance distribution, while simultaneously wasting the token budget on semantically redundant high-ranking chunks. In this paper, we rethink retrieval-augmentation as a dynamic optimization problem aimed at maximizing information density. We propose MergeRAG, a novel framework that shifts the paradigm from static filtering to query-aware synthesis. MergeRAG employs a scoring agent to restructure retrieved contexts through a dual-pathway mechanism: 1) Symmetric Merging, which consolidates weak signals to recover lost bridging evidence; 2) Asymmetric Merging, which utilizes entropy-guided anchoring to eliminate redundancy without sacrificing semantic integrity. We further introduce a Hierarchical Parallel Merging strategy that mitigates information loss while maximizing computational parallelism. Extensive experiments on standard benchmarks demonstrate that MergeRAG significantly outperforms state-of-the-art RAG baselines, achieving up to 13.7 points improvement in F1 score and 11.5 points in Exact Match (EM), respectively.

Rethinking Retrieval-Augmentation as Synthesis: A Query-Aware Context Merging Approach

Abstract

Retrieval-Augmented Generation (RAG) enables Large Language Models (LLMs) to extend their existing knowledge by dynamically incorporating external information. However, practical deployment is fundamentally constrained by the LLM's finite context window, forcing a trade-off between information sufficiency and token consumption. Standard pipelines address this via a retrieve-then-select strategy, typically retaining only the top-k chunks based on relevance. Nevertheless, this approach is suboptimal: it inherently truncates critical bridging evidence located in the long tail of the relevance distribution, while simultaneously wasting the token budget on semantically redundant high-ranking chunks. In this paper, we rethink retrieval-augmentation as a dynamic optimization problem aimed at maximizing information density. We propose MergeRAG, a novel framework that shifts the paradigm from static filtering to query-aware synthesis. MergeRAG employs a scoring agent to restructure retrieved contexts through a dual-pathway mechanism: 1) Symmetric Merging, which consolidates weak signals to recover lost bridging evidence; 2) Asymmetric Merging, which utilizes entropy-guided anchoring to eliminate redundancy without sacrificing semantic integrity. We further introduce a Hierarchical Parallel Merging strategy that mitigates information loss while maximizing computational parallelism. Extensive experiments on standard benchmarks demonstrate that MergeRAG significantly outperforms state-of-the-art RAG baselines, achieving up to 13.7 points improvement in F1 score and 11.5 points in Exact Match (EM), respectively.
Paper Structure (24 sections, 9 equations, 8 figures, 2 tables, 4 algorithms)

This paper contains 24 sections, 9 equations, 8 figures, 2 tables, 4 algorithms.

Figures (8)

  • Figure 1: Comparison between standard RAG solutions and MergeRAG.
  • Figure 2: Similarity examples of chunks. The chunks are sorted by relevance score, with chunks to the top-left corner having high relevance scores. The cell at position $(i,j)$ denotes the similarity between chunks $c_i$ and $c_j$.
  • Figure 3: The information-theoretic formulation of Asymmetric Merging. We formulate redundancy elimination as maximizing the mutual information $I(c_{src};c_{anc})$. Leveraging the entropy identity, it is equivalent to minimizing the conditional entropy $H(c_{src}|c_{anc})$, which represents the extra description length of $c_{src}$ given $c_{anc}$. The conditional entropy is further approximated by minimizing the Negative Log-Likelihood $\mathcal{L}_{\text{NLL}}(c_{src}|c_{anc})$.
  • Figure 4: Impact of different $k$ on F1.
  • Figure 5: Ablation on Hierarchical Parallel Merging
  • ...and 3 more figures