Table of Contents
Fetching ...

Graph-O1 : Monte Carlo Tree Search with Reinforcement Learning for Text-Attributed Graph Reasoning

Lihui Liu

TL;DR

This paper tackles multi-hop question answering on text-attributed graphs by introducing Graph-O1, a framework that couples Monte Carlo Tree Search with end-to-end reinforcement learning to enable an LLM-based agent to selectively explore informative graph components. The method formalizes graph reasoning as an MDP with graph-function actions, uses MCTS to plan reasoning trajectories, and applies GRPO to fine-tune the policy across discovered trajectories. Empirical results across diverse domains show Graph-O1 consistently outperforms text-only, graph-RAG, and other agent-based baselines in accuracy and interpretability. The work demonstrates the value of integrating structured graph reasoning with principled planning and learning for robust, scalable graph QA.

Abstract

ChatGPT said: Text-attributed graphs, where nodes and edges contain rich textual information, are widely used across diverse domains. A central challenge in this setting is question answering, which requires jointly leveraging unstructured text and the structured relational signals within the graph. Although Large Language Models (LLMs) have made significant advances in natural language understanding, their direct use for reasoning over text-attributed graphs remains limited. Retrieval-augmented generation methods that operate purely on text often treat passages as isolated units, ignoring the interconnected structure of the graph. Conversely, graph-based RAG methods that serialize large subgraphs into long textual sequences quickly become infeasible due to LLM context-length constraints, resulting in fragmented reasoning and degraded accuracy. To overcome these limitations, we introduce Graph-O1, an agentic GraphRAG framework that enables LLMs to conduct stepwise, interactive reasoning over graphs. Our approach integrates Monte Carlo Tree Search (MCTS) with end-to-end reinforcement learning, allowing the model to selectively explore and retrieve only the most informative subgraph components. The reasoning procedure is framed as a multi-turn interaction between the agent and the graph environment, and the agent is trained through a unified reward mechanism. Extensive experiments across multiple LLM backbones demonstrate that Graph-O1 consistently surpasses state-of-the-art baselines, producing answers that are more accurate, reliable, and interpretable.

Graph-O1 : Monte Carlo Tree Search with Reinforcement Learning for Text-Attributed Graph Reasoning

TL;DR

This paper tackles multi-hop question answering on text-attributed graphs by introducing Graph-O1, a framework that couples Monte Carlo Tree Search with end-to-end reinforcement learning to enable an LLM-based agent to selectively explore informative graph components. The method formalizes graph reasoning as an MDP with graph-function actions, uses MCTS to plan reasoning trajectories, and applies GRPO to fine-tune the policy across discovered trajectories. Empirical results across diverse domains show Graph-O1 consistently outperforms text-only, graph-RAG, and other agent-based baselines in accuracy and interpretability. The work demonstrates the value of integrating structured graph reasoning with principled planning and learning for robust, scalable graph QA.

Abstract

ChatGPT said: Text-attributed graphs, where nodes and edges contain rich textual information, are widely used across diverse domains. A central challenge in this setting is question answering, which requires jointly leveraging unstructured text and the structured relational signals within the graph. Although Large Language Models (LLMs) have made significant advances in natural language understanding, their direct use for reasoning over text-attributed graphs remains limited. Retrieval-augmented generation methods that operate purely on text often treat passages as isolated units, ignoring the interconnected structure of the graph. Conversely, graph-based RAG methods that serialize large subgraphs into long textual sequences quickly become infeasible due to LLM context-length constraints, resulting in fragmented reasoning and degraded accuracy. To overcome these limitations, we introduce Graph-O1, an agentic GraphRAG framework that enables LLMs to conduct stepwise, interactive reasoning over graphs. Our approach integrates Monte Carlo Tree Search (MCTS) with end-to-end reinforcement learning, allowing the model to selectively explore and retrieve only the most informative subgraph components. The reasoning procedure is framed as a multi-turn interaction between the agent and the graph environment, and the agent is trained through a unified reward mechanism. Extensive experiments across multiple LLM backbones demonstrate that Graph-O1 consistently surpasses state-of-the-art baselines, producing answers that are more accurate, reliable, and interpretable.

Paper Structure

This paper contains 17 sections, 13 equations, 1 figure, 7 tables.

Figures (1)

  • Figure 1: Architecture of our proposed Graph-O1 Model.