Credible Plan-Driven RAG Method for Multi-Hop Question Answering
Ningning Zhang, Chi Zhang, Zhizhong Tan, Xingxing Yang, Weiping Deng, Wenyong Wang
TL;DR
This work tackles multi-hop QA in retrieval-augmented generation by introducing PAR-RAG, a Plan-then-Act-and-Review framework that grounds reasoning in semantic complexity. It combines complexity-aware exemplar selection, structured plan generation, a dense retrieve-and-read acting phase, and a dual verification mechanism that adapts verification strength to task difficulty. Empirical results across 2WikiMultiHopQA, HotpotQA, MuSiQue, and TriviaQA show PAR-RAG consistently outperforms competitive baselines, with ablations confirming the pivotal roles of planning and verification. The approach advances credible, generalizable reasoning in high-stakes contexts by balancing trajectory stability and factual reliability, albeit with higher latency and computational cost that invites future efficiency improvements.
Abstract
Retrieval-augmented generation (RAG) has demonstrated strong performance in single-hop question answering (QA) by integrating external knowledge into large language models (LLMs). However, its effectiveness remains limited in multi-hop QA, which demands both stable reasoning and factual consistency. Existing approaches often provide partial solutions, addressing either reasoning trajectory stability or factual verification, but rarely achieving both simultaneously. To bridge this gap, we propose PAR-RAG, a three-stage Plan-then-Act-and-Review framework inspired by the PDCA cycle. PAR-RAG incorporates semantic complexity as a unifying principle through three key components: (i) complexity-aware exemplar selection guides plan generation by aligning decomposition granularity with question difficulty, thereby stabilizing reasoning trajectories; (ii) execution follows a structured retrieve-then-read process; and (iii) dual verification identifies and corrects intermediate errors while dynamically adjusting verification strength based on question complexity: emphasizing accuracy for simple queries and multi-evidence consistency for complex ones. This cognitively inspired framework integrates theoretical grounding with practical robustness. Experiments across diverse benchmarks demonstrate that PAR-RAG consistently outperforms competitive baselines, while ablation studies confirm the complementary roles of complexity-aware planning and dual verification. Collectively, these results establish PAR-RAG as a robust and generalizable framework for reliable multi-hop reasoning.
