ParetoRAG: Leveraging Sentence-Context Attention for Robust and Efficient Retrieval-Augmented Generation
Ruobing Yao, Yifei Zhang, Shuang Song, Yuhua Liu, Neng Gao, Chenyang Tu
TL;DR
ParetoRAG addresses noise and redundancy in retrieval-augmented generation by applying sentence-level decomposition and a Pareto-guided weighting scheme to select the most informative sentences. It implements an unsupervised three-step pipeline—encode core sentences with their contexts, weight and rank sentence-context pairs, and generate with an LM using the top results—without additional training or API calls. The approach yields consistent gains in accuracy and fluency across open-domain QA datasets and multiple retrievers, while reducing token usage by about 70% compared to naive RAG. Furthermore, ParetoRAG complements robust training strategies, suggesting a practical route to more reliable, efficient RAG systems in real-world settings.
Abstract
While Retrieval-Augmented Generation (RAG) systems enhance Large Language Models (LLMs) by incorporating external knowledge, they still face persistent challenges in retrieval inefficiency and the inability of LLMs to filter out irrelevant information. We present ParetoRAG, an unsupervised framework that optimizes RAG systems through sentence-level refinement guided by the Pareto principle. By decomposing paragraphs into sentences and dynamically re-weighting core content while preserving contextual coherence, ParetoRAG achieves dual improvements in both retrieval precision and generation quality without requiring additional training or API resources. This framework has been empirically validated across various datasets, LLMs, and retrievers.
