Table of Contents
Fetching ...

GraphRAG-R1: Graph Retrieval-Augmented Generation with Process-Constrained Reinforcement Learning

Chuanyue Yu, Kuo Zhao, Yuhan Li, Heng Chang, Mingjian Feng, Xiangzhe Jiang, Yufei Sun, Jia Li, Yuzhi Zhang, Jianxin Li, Ziwei Zhang

TL;DR

GraphRAG-R1 tackles the bottlenecks of graph retrieval-augmented generation in multi-hop reasoning by training LLMs with process-constrained reinforcement learning. It introduces rollout-with-thinking GRPO, two rewards (Progressive Retrieval Attenuation and Cost-Aware F1), a three-stage phase-dependent training strategy, and a hybrid graph-textual retrieval design to control retrieval depth and computational cost. Experimental results across HotpotQA, MuSiQue, 2Wiki, and PopQA show state-of-the-art performance and strong generalization, with ablations confirming the necessity of PRA, CAF, and staged training, and demonstrated compatibility with a range of retrievers. The approach offers a scalable, flexible framework that enhances reasoning in GraphRAG systems while balancing accuracy and efficiency, with potential wide-domain applicability and extension to additional graph types.

Abstract

Graph Retrieval-Augmented Generation (GraphRAG) has shown great effectiveness in enhancing the reasoning abilities of LLMs by leveraging graph structures for knowledge representation and modeling complex real-world relationships. However, existing GraphRAG methods still face significant bottlenecks when handling complex problems that require multi-hop reasoning, as their query and retrieval phases are largely based on pre-defined heuristics and do not fully utilize the reasoning potentials of LLMs. To address this problem, we propose GraphRAG-R1, an adaptive GraphRAG framework by training LLMs with process-constrained outcome-based reinforcement learning (RL) to enhance the multi-hop reasoning ability. Our method can decompose complex problems, autonomously invoke retrieval tools to acquire necessary information, and perform effective reasoning. Specifically, we utilize a modified version of Group Relative Policy Optimization (GRPO) that supports rollout-with-thinking capability. Next, we design two process-constrained reward functions. To handle the shallow retrieval problem, we design a Progressive Retrieval Attenuation (PRA) reward to encourage essential retrievals. Then, to handle the over-thinking problem, we design Cost-Aware F1 (CAF) reward to balance the model performance with computational costs. We further design a phase-dependent training strategy, containing three training stages corresponding to cold start and these two rewards. Lastly, our method adopts a hybrid graph-textual retrieval to improve the reasoning capacity. Extensive experimental results demonstrate that GraphRAG-R1 boosts LLM capabilities in solving complex reasoning problems compared to state-of-the-art GraphRAG methods on both in-domain and out-of-domain datasets. Furthermore, our framework can be flexibly integrated with various existing retrieval methods, consistently delivering performance improvements.

GraphRAG-R1: Graph Retrieval-Augmented Generation with Process-Constrained Reinforcement Learning

TL;DR

GraphRAG-R1 tackles the bottlenecks of graph retrieval-augmented generation in multi-hop reasoning by training LLMs with process-constrained reinforcement learning. It introduces rollout-with-thinking GRPO, two rewards (Progressive Retrieval Attenuation and Cost-Aware F1), a three-stage phase-dependent training strategy, and a hybrid graph-textual retrieval design to control retrieval depth and computational cost. Experimental results across HotpotQA, MuSiQue, 2Wiki, and PopQA show state-of-the-art performance and strong generalization, with ablations confirming the necessity of PRA, CAF, and staged training, and demonstrated compatibility with a range of retrievers. The approach offers a scalable, flexible framework that enhances reasoning in GraphRAG systems while balancing accuracy and efficiency, with potential wide-domain applicability and extension to additional graph types.

Abstract

Graph Retrieval-Augmented Generation (GraphRAG) has shown great effectiveness in enhancing the reasoning abilities of LLMs by leveraging graph structures for knowledge representation and modeling complex real-world relationships. However, existing GraphRAG methods still face significant bottlenecks when handling complex problems that require multi-hop reasoning, as their query and retrieval phases are largely based on pre-defined heuristics and do not fully utilize the reasoning potentials of LLMs. To address this problem, we propose GraphRAG-R1, an adaptive GraphRAG framework by training LLMs with process-constrained outcome-based reinforcement learning (RL) to enhance the multi-hop reasoning ability. Our method can decompose complex problems, autonomously invoke retrieval tools to acquire necessary information, and perform effective reasoning. Specifically, we utilize a modified version of Group Relative Policy Optimization (GRPO) that supports rollout-with-thinking capability. Next, we design two process-constrained reward functions. To handle the shallow retrieval problem, we design a Progressive Retrieval Attenuation (PRA) reward to encourage essential retrievals. Then, to handle the over-thinking problem, we design Cost-Aware F1 (CAF) reward to balance the model performance with computational costs. We further design a phase-dependent training strategy, containing three training stages corresponding to cold start and these two rewards. Lastly, our method adopts a hybrid graph-textual retrieval to improve the reasoning capacity. Extensive experimental results demonstrate that GraphRAG-R1 boosts LLM capabilities in solving complex reasoning problems compared to state-of-the-art GraphRAG methods on both in-domain and out-of-domain datasets. Furthermore, our framework can be flexibly integrated with various existing retrieval methods, consistently delivering performance improvements.

Paper Structure

This paper contains 34 sections, 7 equations, 4 figures, 10 tables.

Figures (4)

  • Figure 1: An example of comparing using only LLMs, GraphRAG, and GraphRAG-R1 in answering complex problems: (a) The LLM (Qwen2.5-7B) directly produces an output, but the answer is incorrect, (b) HippoRAG2 gutierrez2025hipporag2 enhances the LLM (Qwen2.5-7B) by retrieving external knowledge, but fail to generate correct outputs for complex problem, (c) our method successfully decomposes the problem and autonomously invokes retrieval methods, producing the correct output.
  • Figure 2: An overview of GraphRAG-R1 : (a) the rollout retrieval enhanced GRPO as the training strategy of LLM, (b) process-constrained reward designs, containing PRA and CAF rewards and the phase-dependent training strategy, (c) the hybrid graph-textual retrieval, which is more informative than text fragments.
  • Figure 3: A comparison of different retrieval formats. Left: the F1 score comparison. Right: the token cost for testing.
  • Figure 4: Sensitivity analysis of key hyperparameters in GraphRAG-R1. Subfigures (a)-(d) illustrate the variation in F1 score on the four validation sets as the key hyperparameters are varied within specified ranges. Other hyperparameters were fixed at their main experiment values. The gray dashed lines indicate the parameter values selected for the main experiments.