Table of Contents
Fetching ...

Credible Plan-Driven RAG Method for Multi-Hop Question Answering

Ningning Zhang, Chi Zhang, Zhizhong Tan, Xingxing Yang, Weiping Deng, Wenyong Wang

TL;DR

This work tackles multi-hop QA in retrieval-augmented generation by introducing PAR-RAG, a Plan-then-Act-and-Review framework that grounds reasoning in semantic complexity. It combines complexity-aware exemplar selection, structured plan generation, a dense retrieve-and-read acting phase, and a dual verification mechanism that adapts verification strength to task difficulty. Empirical results across 2WikiMultiHopQA, HotpotQA, MuSiQue, and TriviaQA show PAR-RAG consistently outperforms competitive baselines, with ablations confirming the pivotal roles of planning and verification. The approach advances credible, generalizable reasoning in high-stakes contexts by balancing trajectory stability and factual reliability, albeit with higher latency and computational cost that invites future efficiency improvements.

Abstract

Retrieval-augmented generation (RAG) has demonstrated strong performance in single-hop question answering (QA) by integrating external knowledge into large language models (LLMs). However, its effectiveness remains limited in multi-hop QA, which demands both stable reasoning and factual consistency. Existing approaches often provide partial solutions, addressing either reasoning trajectory stability or factual verification, but rarely achieving both simultaneously. To bridge this gap, we propose PAR-RAG, a three-stage Plan-then-Act-and-Review framework inspired by the PDCA cycle. PAR-RAG incorporates semantic complexity as a unifying principle through three key components: (i) complexity-aware exemplar selection guides plan generation by aligning decomposition granularity with question difficulty, thereby stabilizing reasoning trajectories; (ii) execution follows a structured retrieve-then-read process; and (iii) dual verification identifies and corrects intermediate errors while dynamically adjusting verification strength based on question complexity: emphasizing accuracy for simple queries and multi-evidence consistency for complex ones. This cognitively inspired framework integrates theoretical grounding with practical robustness. Experiments across diverse benchmarks demonstrate that PAR-RAG consistently outperforms competitive baselines, while ablation studies confirm the complementary roles of complexity-aware planning and dual verification. Collectively, these results establish PAR-RAG as a robust and generalizable framework for reliable multi-hop reasoning.

Credible Plan-Driven RAG Method for Multi-Hop Question Answering

TL;DR

This work tackles multi-hop QA in retrieval-augmented generation by introducing PAR-RAG, a Plan-then-Act-and-Review framework that grounds reasoning in semantic complexity. It combines complexity-aware exemplar selection, structured plan generation, a dense retrieve-and-read acting phase, and a dual verification mechanism that adapts verification strength to task difficulty. Empirical results across 2WikiMultiHopQA, HotpotQA, MuSiQue, and TriviaQA show PAR-RAG consistently outperforms competitive baselines, with ablations confirming the pivotal roles of planning and verification. The approach advances credible, generalizable reasoning in high-stakes contexts by balancing trajectory stability and factual reliability, albeit with higher latency and computational cost that invites future efficiency improvements.

Abstract

Retrieval-augmented generation (RAG) has demonstrated strong performance in single-hop question answering (QA) by integrating external knowledge into large language models (LLMs). However, its effectiveness remains limited in multi-hop QA, which demands both stable reasoning and factual consistency. Existing approaches often provide partial solutions, addressing either reasoning trajectory stability or factual verification, but rarely achieving both simultaneously. To bridge this gap, we propose PAR-RAG, a three-stage Plan-then-Act-and-Review framework inspired by the PDCA cycle. PAR-RAG incorporates semantic complexity as a unifying principle through three key components: (i) complexity-aware exemplar selection guides plan generation by aligning decomposition granularity with question difficulty, thereby stabilizing reasoning trajectories; (ii) execution follows a structured retrieve-then-read process; and (iii) dual verification identifies and corrects intermediate errors while dynamically adjusting verification strength based on question complexity: emphasizing accuracy for simple queries and multi-evidence consistency for complex ones. This cognitively inspired framework integrates theoretical grounding with practical robustness. Experiments across diverse benchmarks demonstrate that PAR-RAG consistently outperforms competitive baselines, while ablation studies confirm the complementary roles of complexity-aware planning and dual verification. Collectively, these results establish PAR-RAG as a robust and generalizable framework for reliable multi-hop reasoning.

Paper Structure

This paper contains 41 sections, 13 equations, 7 figures, 8 tables, 1 algorithm.

Figures (7)

  • Figure 1: Comparison of the proposed PAR-RAG (c) with Standard RAG (a), Iterative Reasoning RAG (b).
  • Figure 2: Overview of the PAR-RAG workflow, which follows a Plan-then-Act-and-Review cycle: planning generates exemplar-aligned reasoning steps, acting executes them to produce answers, and reviewing verifies the intermediate results for accuracy and consistency.
  • Figure 3: Confusion Matrix of the BERT classifier based on semantic entropy and other multi-dimensional features.
  • Figure 4: Accuracy Gains from Complexity-Aligned Exemplar Selection over Alternative Strategies.
  • Figure 5: Impact of Misclassified Hop Counts on Final Answer Accuracy.
  • ...and 2 more figures