Table of Contents
Fetching ...

LevelRAG: Enhancing Retrieval-Augmented Generation with Multi-hop Logic Planning over Rewriting Augmented Searchers

Zhuocheng Zhang, Yang Feng, Min Zhang

TL;DR

LevelRAG addresses limitations of traditional RAG by decoupling query rewriting from retriever-specific optimization and introducing a hierarchical retrieval framework. A high-level searcher conducts multi-hop logic planning and aggregates results, while three low-level searchers (sparse, dense, web) refine queries for their respective retrievers. Empirical results across five datasets show LevelRAG outperforms strong baselines and even surpasses GPT-4o on several tasks, highlighting the effectiveness of hybrid retrieval and structured query refinement. The work demonstrates the practical potential of combining Lucene-based sparse search with semantic dense search and web augmentation to improve both completeness and accuracy in knowledge-intensive QA scenarios.

Abstract

Retrieval-Augmented Generation (RAG) is a crucial method for mitigating hallucinations in Large Language Models (LLMs) and integrating external knowledge into their responses. Existing RAG methods typically employ query rewriting to clarify the user intent and manage multi-hop logic, while using hybrid retrieval to expand search scope. However, the tight coupling of query rewriting to the dense retriever limits its compatibility with hybrid retrieval, impeding further RAG performance improvements. To address this challenge, we introduce a high-level searcher that decomposes complex queries into atomic queries, independent of any retriever-specific optimizations. Additionally, to harness the strengths of sparse retrievers for precise keyword retrieval, we have developed a new sparse searcher that employs Lucene syntax to enhance retrieval accuracy.Alongside web and dense searchers, these components seamlessly collaborate within our proposed method, \textbf{LevelRAG}. In LevelRAG, the high-level searcher orchestrates the retrieval logic, while the low-level searchers (sparse, web, and dense) refine the queries for optimal retrieval. This approach enhances both the completeness and accuracy of the retrieval process, overcoming challenges associated with current query rewriting techniques in hybrid retrieval scenarios. Empirical experiments conducted on five datasets, encompassing both single-hop and multi-hop question answering tasks, demonstrate the superior performance of LevelRAG compared to existing RAG methods. Notably, LevelRAG outperforms the state-of-the-art proprietary model, GPT4o, underscoring its effectiveness and potential impact on the RAG field.

LevelRAG: Enhancing Retrieval-Augmented Generation with Multi-hop Logic Planning over Rewriting Augmented Searchers

TL;DR

LevelRAG addresses limitations of traditional RAG by decoupling query rewriting from retriever-specific optimization and introducing a hierarchical retrieval framework. A high-level searcher conducts multi-hop logic planning and aggregates results, while three low-level searchers (sparse, dense, web) refine queries for their respective retrievers. Empirical results across five datasets show LevelRAG outperforms strong baselines and even surpasses GPT-4o on several tasks, highlighting the effectiveness of hybrid retrieval and structured query refinement. The work demonstrates the practical potential of combining Lucene-based sparse search with semantic dense search and web augmentation to improve both completeness and accuracy in knowledge-intensive QA scenarios.

Abstract

Retrieval-Augmented Generation (RAG) is a crucial method for mitigating hallucinations in Large Language Models (LLMs) and integrating external knowledge into their responses. Existing RAG methods typically employ query rewriting to clarify the user intent and manage multi-hop logic, while using hybrid retrieval to expand search scope. However, the tight coupling of query rewriting to the dense retriever limits its compatibility with hybrid retrieval, impeding further RAG performance improvements. To address this challenge, we introduce a high-level searcher that decomposes complex queries into atomic queries, independent of any retriever-specific optimizations. Additionally, to harness the strengths of sparse retrievers for precise keyword retrieval, we have developed a new sparse searcher that employs Lucene syntax to enhance retrieval accuracy.Alongside web and dense searchers, these components seamlessly collaborate within our proposed method, \textbf{LevelRAG}. In LevelRAG, the high-level searcher orchestrates the retrieval logic, while the low-level searchers (sparse, web, and dense) refine the queries for optimal retrieval. This approach enhances both the completeness and accuracy of the retrieval process, overcoming challenges associated with current query rewriting techniques in hybrid retrieval scenarios. Empirical experiments conducted on five datasets, encompassing both single-hop and multi-hop question answering tasks, demonstrate the superior performance of LevelRAG compared to existing RAG methods. Notably, LevelRAG outperforms the state-of-the-art proprietary model, GPT4o, underscoring its effectiveness and potential impact on the RAG field.

Paper Structure

This paper contains 25 sections, 8 figures, 7 tables, 1 algorithm.

Figures (8)

  • Figure 1: Overview of the LevelRAG. The user query is initially processed by the high-level searcher. The decomposed atomic queries are handled by low-level searchers, which may rewrite and refine the queries before sending them into their corresponding retrievers. The retrieved documents are aggregated and summarized by the high-level searcher, then fed to the generator to generate the response. Both the high-level searcher and low-level searchers employs the feedback from the retrieved documents to refine or supplement their outputs.
  • Figure 2: An example of how the high-level searcher processes a user query. The high-level searcher performs four key actions: decompose, summarize, verify, and supplement. Actions within blue boxes are carried out by the high-level searcher, while those within grey boxes are performed by the low-level searchers and user.
  • Figure 3: The prompt used for decomposing the question into atomic queries, which is used in the high-level searcher.
  • Figure 4: The prompt used to supplement the atomic queries, which is used in the high-level searcher.
  • Figure 5: The prompt used to rewrite the atomic queries into keywords, which is used in the sparse searcher.
  • ...and 3 more figures