Table of Contents
Fetching ...

Reasoning by Exploration: A Unified Approach to Retrieval and Generation over Graphs

Haoyu Han, Kai Guo, Harry Shomer, Yu Wang, Yucheng Chu, Hang Li, Li Ma, Jiliang Tang

TL;DR

Reasoning by Exploration (RoE) addresses the limitations of decoupled GraphRAG systems by unifying retrieval and generation into stepwise graph exploration guided by an LLM. It introduces a two-stage training pipeline—supervised fine-tuning on gold exploration trajectories, followed by reinforcement learning with a suite of rule-based rewards—to learn robust, generalizable exploration strategies. RoE demonstrates substantial gains on KGQA benchmarks and strong generalization to unseen graphs, outperforming both LLM-only and traditional GraphRAG baselines. The approach offers a practical pathway to reliable, scalable graph reasoning in real-world settings where graphs vary across domains and tasks.

Abstract

Reasoning over structured graphs remains a fundamental challenge for Large Language Models (LLMs), particularly when scaling to large graphs. Existing approaches typically follow the retrieval-augmented generation (RAG) paradigm: first retrieving subgraphs relevant to the query and then generating answers conditioned on the retrieved subgraphs. However, such two-phase pipelines often struggle to faithfully incorporate graph structure, since the generation process is ultimately constrained by the quality and completeness of the retrieved subgraph. Although many advanced retrievers have been proposed recently to mitigate this issue, they are usually tailored to the training graphs and generalize poorly to unseen graphs, which limits their practical applicability. In this work, we propose Reasoning by Exploration (RoE), a novel approach that unifies retrieval and generation by framing reasoning over graphs as a process of graph exploration. At each step, the LLM selects candidate nodes and edges to explore, gradually constructing reasoning paths and generating answers along the way. To enable effective exploration, RoE is trained in two stages: supervised fine-tuning (SFT) on gold reasoning paths, followed by reinforcement learning (RL) to enhance exploration effectiveness and generalization. Experiments on benchmark datasets demonstrate that RoE achieves substantial overall improvements over baselines, while also generalizing effectively to unseen graphs.

Reasoning by Exploration: A Unified Approach to Retrieval and Generation over Graphs

TL;DR

Reasoning by Exploration (RoE) addresses the limitations of decoupled GraphRAG systems by unifying retrieval and generation into stepwise graph exploration guided by an LLM. It introduces a two-stage training pipeline—supervised fine-tuning on gold exploration trajectories, followed by reinforcement learning with a suite of rule-based rewards—to learn robust, generalizable exploration strategies. RoE demonstrates substantial gains on KGQA benchmarks and strong generalization to unseen graphs, outperforming both LLM-only and traditional GraphRAG baselines. The approach offers a practical pathway to reliable, scalable graph reasoning in real-world settings where graphs vary across domains and tasks.

Abstract

Reasoning over structured graphs remains a fundamental challenge for Large Language Models (LLMs), particularly when scaling to large graphs. Existing approaches typically follow the retrieval-augmented generation (RAG) paradigm: first retrieving subgraphs relevant to the query and then generating answers conditioned on the retrieved subgraphs. However, such two-phase pipelines often struggle to faithfully incorporate graph structure, since the generation process is ultimately constrained by the quality and completeness of the retrieved subgraph. Although many advanced retrievers have been proposed recently to mitigate this issue, they are usually tailored to the training graphs and generalize poorly to unseen graphs, which limits their practical applicability. In this work, we propose Reasoning by Exploration (RoE), a novel approach that unifies retrieval and generation by framing reasoning over graphs as a process of graph exploration. At each step, the LLM selects candidate nodes and edges to explore, gradually constructing reasoning paths and generating answers along the way. To enable effective exploration, RoE is trained in two stages: supervised fine-tuning (SFT) on gold reasoning paths, followed by reinforcement learning (RL) to enhance exploration effectiveness and generalization. Experiments on benchmark datasets demonstrate that RoE achieves substantial overall improvements over baselines, while also generalizing effectively to unseen graphs.

Paper Structure

This paper contains 23 sections, 14 equations, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: Retrieval and generalization performance of different methods on WebQSP and CWQ datasets.
  • Figure 2: The framework of RoE . The model begins from the seed entity (green), incrementally expands to explored entities (yellow), while discovering answer entities (red). Stage 1 uses SFT to learn gold reasoning paths, and the trained model is then used as the initial model for Stage 2, where RL refines the exploration policy based on reward feedback.
  • Figure 3: Generalization performance (Hit/F1) of different methods across dataset transfers.
  • Figure 4: Performance of RoE and its variants pretrained on the WebQSP dataset. Marker shapes denote different model variants, while filled and hollow markers represent Hit and metrics, respectively.