Table of Contents
Fetching ...

Plan*RAG: Efficient Test-Time Planning for Retrieval Augmented Generation

Prakhar Verma, Sukruta Prakash Midigeshi, Gaurav Sinha, Arno Solin, Nagarajan Natarajan, Amit Sharma

TL;DR

Plan*RAG introduces test-time planning by externalizing a reasoning plan as a directed acyclic graph (DAG) that guides multi-hop retrieval-augmented generation. By decomposing queries into atomic, dynamically linked subqueries and enabling parallel execution, Plan*RAG achieves higher accuracy on standard multi-hop benchmarks while maintaining comparable compute to baseline RAG methods. The approach integrates with existing RAG frameworks (e.g., Self-RAG) and demonstrates that a reasonably sized reasoning planner, including a fine-tuned Llama model, can match larger language models in planning quality. The work provides a practical, modular framework for robust multi-hop reasoning with explicit verification opportunities and bounded context usage, potentially benefiting critical knowledge-intensive applications.

Abstract

We introduce Plan*RAG, a novel framework that enables structured multi-hop reasoning in retrieval-augmented generation (RAG) through test-time reasoning plan generation. While existing approaches such as ReAct maintain reasoning chains within the language model's context window, we observe that this often leads to plan fragmentation and execution failures. Our key insight is that by isolating the reasoning plan as a directed acyclic graph (DAG) outside the LM's working memory, we can enable (1) systematic exploration of reasoning paths, (2) atomic subqueries enabling precise retrievals and grounding, and (3) efficiency through parallel execution and bounded context window utilization. Moreover, Plan*RAG's modular design allows it to be integrated with existing RAG methods, thus providing a practical solution to improve current RAG systems. On standard multi-hop reasoning benchmarks, Plan*RAG consistently achieves improvements over recently proposed methods such as RQ-RAG and Self-RAG, while maintaining comparable computational costs.

Plan*RAG: Efficient Test-Time Planning for Retrieval Augmented Generation

TL;DR

Plan*RAG introduces test-time planning by externalizing a reasoning plan as a directed acyclic graph (DAG) that guides multi-hop retrieval-augmented generation. By decomposing queries into atomic, dynamically linked subqueries and enabling parallel execution, Plan*RAG achieves higher accuracy on standard multi-hop benchmarks while maintaining comparable compute to baseline RAG methods. The approach integrates with existing RAG frameworks (e.g., Self-RAG) and demonstrates that a reasonably sized reasoning planner, including a fine-tuned Llama model, can match larger language models in planning quality. The work provides a practical, modular framework for robust multi-hop reasoning with explicit verification opportunities and bounded context usage, potentially benefiting critical knowledge-intensive applications.

Abstract

We introduce Plan*RAG, a novel framework that enables structured multi-hop reasoning in retrieval-augmented generation (RAG) through test-time reasoning plan generation. While existing approaches such as ReAct maintain reasoning chains within the language model's context window, we observe that this often leads to plan fragmentation and execution failures. Our key insight is that by isolating the reasoning plan as a directed acyclic graph (DAG) outside the LM's working memory, we can enable (1) systematic exploration of reasoning paths, (2) atomic subqueries enabling precise retrievals and grounding, and (3) efficiency through parallel execution and bounded context window utilization. Moreover, Plan*RAG's modular design allows it to be integrated with existing RAG methods, thus providing a practical solution to improve current RAG systems. On standard multi-hop reasoning benchmarks, Plan*RAG consistently achieves improvements over recently proposed methods such as RQ-RAG and Self-RAG, while maintaining comparable computational costs.

Paper Structure

This paper contains 59 sections, 6 equations, 4 figures, 6 tables, 1 algorithm.

Figures (4)

  • Figure 1: Plan$^\ast$RAG improves performance on the HotpotQA benchmark substantially compared to various existing RAG methods, demonstrating the value of externalizing planning as a directed acyclic graph (DAG) outside of the LLM's context.
  • Figure 2: Comparison of RAG approaches: Comparison of different RAG approaches on a HotpotQA multi-hop query. Traditional RAG methods (right) struggle with context management and implicit reasoning, where sequential operators create implicit dependencies. In contrast, Plan$^\ast$RAG (left) generates an explicit reasoning plan as a DAG ($\mathcal{G}$) at test-time, with special tags $\langle$AI,J$\rangle$ enabling dynamic information flow through parent subqueries ($\mathbf{Pa}(q)$). On this example query, while previous approaches fail to identify the correct publication year, Plan$^\ast$RAG successfully decomposes the reasoning process and arrives at the correct answer (1967).
  • Figure 3: Reasoning plan example: A Reasoning DAG generated by the reasoning plan expert, highlighting key advantages: only relevant information flows to each subquery, subqueries on the same depth can be executed in parallel, and the DAG structure allows for debugging and backtracking.
  • Figure 4: Efficiency analysis: (a) Token utilization comparison showing that Plan$^\ast$RAG maintains comparable computational efficiency with baseline methods. (b) Analysis of reasoning depth on HotpotQA demonstrates that Plan$^\ast$RAG naturally adapts to the dataset's 2-hop nature, achieving optimal depth for 80% of queries while other methods show inconsistent reasoning depths.