Table of Contents
Fetching ...

ParetoRAG: Leveraging Sentence-Context Attention for Robust and Efficient Retrieval-Augmented Generation

Ruobing Yao, Yifei Zhang, Shuang Song, Yuhua Liu, Neng Gao, Chenyang Tu

TL;DR

ParetoRAG addresses noise and redundancy in retrieval-augmented generation by applying sentence-level decomposition and a Pareto-guided weighting scheme to select the most informative sentences. It implements an unsupervised three-step pipeline—encode core sentences with their contexts, weight and rank sentence-context pairs, and generate with an LM using the top results—without additional training or API calls. The approach yields consistent gains in accuracy and fluency across open-domain QA datasets and multiple retrievers, while reducing token usage by about 70% compared to naive RAG. Furthermore, ParetoRAG complements robust training strategies, suggesting a practical route to more reliable, efficient RAG systems in real-world settings.

Abstract

While Retrieval-Augmented Generation (RAG) systems enhance Large Language Models (LLMs) by incorporating external knowledge, they still face persistent challenges in retrieval inefficiency and the inability of LLMs to filter out irrelevant information. We present ParetoRAG, an unsupervised framework that optimizes RAG systems through sentence-level refinement guided by the Pareto principle. By decomposing paragraphs into sentences and dynamically re-weighting core content while preserving contextual coherence, ParetoRAG achieves dual improvements in both retrieval precision and generation quality without requiring additional training or API resources. This framework has been empirically validated across various datasets, LLMs, and retrievers.

ParetoRAG: Leveraging Sentence-Context Attention for Robust and Efficient Retrieval-Augmented Generation

TL;DR

ParetoRAG addresses noise and redundancy in retrieval-augmented generation by applying sentence-level decomposition and a Pareto-guided weighting scheme to select the most informative sentences. It implements an unsupervised three-step pipeline—encode core sentences with their contexts, weight and rank sentence-context pairs, and generate with an LM using the top results—without additional training or API calls. The approach yields consistent gains in accuracy and fluency across open-domain QA datasets and multiple retrievers, while reducing token usage by about 70% compared to naive RAG. Furthermore, ParetoRAG complements robust training strategies, suggesting a practical route to more reliable, efficient RAG systems in real-world settings.

Abstract

While Retrieval-Augmented Generation (RAG) systems enhance Large Language Models (LLMs) by incorporating external knowledge, they still face persistent challenges in retrieval inefficiency and the inability of LLMs to filter out irrelevant information. We present ParetoRAG, an unsupervised framework that optimizes RAG systems through sentence-level refinement guided by the Pareto principle. By decomposing paragraphs into sentences and dynamically re-weighting core content while preserving contextual coherence, ParetoRAG achieves dual improvements in both retrieval precision and generation quality without requiring additional training or API resources. This framework has been empirically validated across various datasets, LLMs, and retrievers.

Paper Structure

This paper contains 30 sections, 6 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: The examples show that a large amount of noise impedes the LLM from acquiring accurate knowledge from the retrieved content and could potentially misdirect its reasoning. Finding the correct answer relies on the ability of LLM to identify a small portion of key information.
  • Figure 2: Comparison of the traditional RAG (red path) and ParetoRAG(green path). The traditional method retrieves and directly uses entire passages, which often introduces redundant information, leading to inaccurate answers. In contrast, our method utilizes a preprocessed sentence-level corpus, assigning higher weights to key sentences while appropriately preserving and weighting contextual information to avoid loss of coherence. Inspired by the Pareto principle (the 80/20 rule), this design emphasizes critical information while maintaining necessary semantic consistency. The selected sentences are then fed into the LLM, resulting in more accurate answers.
  • Figure 3: The example of ParetoRAG encodes core sentence M and core sentence (M+1). The content within the same dashed box is split from the same passage. The context of a core sentence consists of sentences from the same passage, excluding the core sentence itself.
  • Figure 4: Comparison of ParetoRAG and Naive RAG on the adaptive noise-robust LLM (llama-2-13b-peft-nq-retrobust and llama-2-13b-peft-hotpotqa-retrobustyoran2024making): (a)(b) show performance under the same recall size (Top 10), while (c)(d) illustrate performance under the same input word count (400).
  • Figure 5: Correct answer rank distributions across different datasets under the the same input word count (400).
  • ...and 3 more figures