Table of Contents
Fetching ...

RouteRAG: Efficient Retrieval-Augmented Generation from Text and Graph via Reinforcement Learning

Yucan Guo, Miao Su, Saiping Guan, Zihao Sun, Xiaolong Jin, Jiafeng Guo, Xueqi Cheng

TL;DR

RouteRAG tackles knowledge-intensive QA by enabling a unified, reinforcement-learning–driven policy that interleaves reasoning with adaptive retrieval from both text and graphs. It introduces a two-stage GRPO-based training framework, first securing answer correctness and then optimizing retrieval efficiency to avoid unnecessary retrieval overhead. The method supports three retrieval modes—Passage, Graph, and Hybrid—via a single generation policy and uses a batch-aware efficiency reward to balance accuracy with compute cost. Experimental results across five benchmarks show RouteRAG outperforms prior multi-turn and graph-based RAG systems, particularly with smaller backbones, highlighting the value of end-to-end RL in adaptive, efficient hybrid retrieval for complex reasoning.

Abstract

Retrieval-Augmented Generation (RAG) integrates non-parametric knowledge into Large Language Models (LLMs), typically from unstructured texts and structured graphs. While recent progress has advanced text-based RAG to multi-turn reasoning through Reinforcement Learning (RL), extending these advances to hybrid retrieval introduces additional challenges. Existing graph-based or hybrid systems typically depend on fixed or handcrafted retrieval pipelines, lacking the ability to integrate supplementary evidence as reasoning unfolds. Besides, while graph evidence provides relational structures crucial for multi-hop reasoning, it is substantially more expensive to retrieve. To address these limitations, we introduce \model{}, an RL-based framework that enables LLMs to perform multi-turn and adaptive graph-text hybrid RAG. \model{} jointly optimizes the entire generation process via RL, allowing the model to learn when to reason, what to retrieve from either texts or graphs, and when to produce final answers, all within a unified generation policy. To guide this learning process, we design a two-stage training framework that accounts for both task outcome and retrieval efficiency, enabling the model to exploit hybrid evidence while avoiding unnecessary retrieval overhead. Experimental results across five question answering benchmarks demonstrate that \model{} significantly outperforms existing RAG baselines, highlighting the benefits of end-to-end RL in supporting adaptive and efficient retrieval for complex reasoning.

RouteRAG: Efficient Retrieval-Augmented Generation from Text and Graph via Reinforcement Learning

TL;DR

RouteRAG tackles knowledge-intensive QA by enabling a unified, reinforcement-learning–driven policy that interleaves reasoning with adaptive retrieval from both text and graphs. It introduces a two-stage GRPO-based training framework, first securing answer correctness and then optimizing retrieval efficiency to avoid unnecessary retrieval overhead. The method supports three retrieval modes—Passage, Graph, and Hybrid—via a single generation policy and uses a batch-aware efficiency reward to balance accuracy with compute cost. Experimental results across five benchmarks show RouteRAG outperforms prior multi-turn and graph-based RAG systems, particularly with smaller backbones, highlighting the value of end-to-end RL in adaptive, efficient hybrid retrieval for complex reasoning.

Abstract

Retrieval-Augmented Generation (RAG) integrates non-parametric knowledge into Large Language Models (LLMs), typically from unstructured texts and structured graphs. While recent progress has advanced text-based RAG to multi-turn reasoning through Reinforcement Learning (RL), extending these advances to hybrid retrieval introduces additional challenges. Existing graph-based or hybrid systems typically depend on fixed or handcrafted retrieval pipelines, lacking the ability to integrate supplementary evidence as reasoning unfolds. Besides, while graph evidence provides relational structures crucial for multi-hop reasoning, it is substantially more expensive to retrieve. To address these limitations, we introduce \model{}, an RL-based framework that enables LLMs to perform multi-turn and adaptive graph-text hybrid RAG. \model{} jointly optimizes the entire generation process via RL, allowing the model to learn when to reason, what to retrieve from either texts or graphs, and when to produce final answers, all within a unified generation policy. To guide this learning process, we design a two-stage training framework that accounts for both task outcome and retrieval efficiency, enabling the model to exploit hybrid evidence while avoiding unnecessary retrieval overhead. Experimental results across five question answering benchmarks demonstrate that \model{} significantly outperforms existing RAG baselines, highlighting the benefits of end-to-end RL in supporting adaptive and efficient retrieval for complex reasoning.

Paper Structure

This paper contains 30 sections, 9 equations, 5 figures, 9 tables, 1 algorithm.

Figures (5)

  • Figure 1: Previous RL-based multi-turn RAG vs. RouteRAG. Prior methods mainly focus on interleaving reasoning with passage retrieval and reward on answer correctness. RouteRAG extends retrieval to passage, graph, and hybrid modes, and is trained with a two-stage RL framework that optimizes both accuracy and efficiency.
  • Figure 2: Comparing the average retrieval turns of RouteRAG and its variant without efficiency reward.
  • Figure 3: Comparison of average reasoning steps.
  • Figure 4: Comparison in terms of performance, response token length, and reasoning turns.
  • Figure 5: Performance of RouteRAG-3B with different number of retrieved documents.