Table of Contents
Fetching ...

Retrieving Minimal and Sufficient Reasoning Subgraphs with Graph Foundation Models for Path-aware GraphRAG

Haonan Yuan, Qingyun Sun, Junhua Shi, Mingjun Liu, Jiaqi Yuan, Ziwei Zhang, Xingcheng Fu, Jianxin Li

TL;DR

This work revisits retrieval from a structural perspective, and proposes GFM-Retriever that directly responds to user queries with a subgraph, where a pre-trained Graph Foundation Model acts as a cross-domain Retriever for multi-hop path-aware reasoning.

Abstract

Graph-based retrieval-augmented generation (GraphRAG) exploits structured knowledge to support knowledge-intensive reasoning. However, most existing methods treat graphs as intermediate artifacts, and the few subgraph-based retrieval methods depend on heuristic rules coupled with domain-specific distributions. They fail in typical cold-start scenarios where data in target domains is scarce, thus yielding reasoning contexts that are either informationally incomplete or structurally redundant. In this work, we revisit retrieval from a structural perspective, and propose GFM-Retriever that directly responds to user queries with a subgraph, where a pre-trained Graph Foundation Model acts as a cross-domain Retriever for multi-hop path-aware reasoning. Building on this perspective, we repurpose a pre-trained GFM from an entity ranking function into a generalized retriever to support cross-domain retrieval. On top of the retrieved graph, we further derive a label-free subgraph selector optimized by a principled Information Bottleneck objective to identify the query-conditioned subgraph, which contains informationally sufficient and structurally minimal golden evidence in a self-contained "core set". To connect structure with generation, we explicitly extract and reorganize relational paths as in-context prompts, enabling interpretable reasoning. Extensive experiments on multi-hop question answering benchmarks demonstrate that GFM-Retriever achieves state-of-the-art performance in both retrieval quality and answer generation, while maintaining efficiency.

Retrieving Minimal and Sufficient Reasoning Subgraphs with Graph Foundation Models for Path-aware GraphRAG

TL;DR

This work revisits retrieval from a structural perspective, and proposes GFM-Retriever that directly responds to user queries with a subgraph, where a pre-trained Graph Foundation Model acts as a cross-domain Retriever for multi-hop path-aware reasoning.

Abstract

Graph-based retrieval-augmented generation (GraphRAG) exploits structured knowledge to support knowledge-intensive reasoning. However, most existing methods treat graphs as intermediate artifacts, and the few subgraph-based retrieval methods depend on heuristic rules coupled with domain-specific distributions. They fail in typical cold-start scenarios where data in target domains is scarce, thus yielding reasoning contexts that are either informationally incomplete or structurally redundant. In this work, we revisit retrieval from a structural perspective, and propose GFM-Retriever that directly responds to user queries with a subgraph, where a pre-trained Graph Foundation Model acts as a cross-domain Retriever for multi-hop path-aware reasoning. Building on this perspective, we repurpose a pre-trained GFM from an entity ranking function into a generalized retriever to support cross-domain retrieval. On top of the retrieved graph, we further derive a label-free subgraph selector optimized by a principled Information Bottleneck objective to identify the query-conditioned subgraph, which contains informationally sufficient and structurally minimal golden evidence in a self-contained "core set". To connect structure with generation, we explicitly extract and reorganize relational paths as in-context prompts, enabling interpretable reasoning. Extensive experiments on multi-hop question answering benchmarks demonstrate that GFM-Retriever achieves state-of-the-art performance in both retrieval quality and answer generation, while maintaining efficiency.
Paper Structure (63 sections, 4 theorems, 31 equations, 10 figures, 8 tables, 3 algorithms)

This paper contains 63 sections, 4 theorems, 31 equations, 10 figures, 8 tables, 3 algorithms.

Key Result

proposition 1

Let $\mathcal{G}$ be a knowledge graph indexed from multiple domains. $\{\operatorname{Dom}_d(e)\}$ denotes unary predicates indicate domain attributes of entity $e$. For a query $q$, the query-conditioned GFM can learn a rule $\textnormal{R}(\textnormal{$\mathbf{q}$},e)$ if and only if $\textnormal

Figures (10)

  • Figure 1: Challenges in subgraph-based RAG.
  • Figure 2: An overview of the Gfm-Retriever framework. (1) Pre-training: A query-conditioned GFM is trained as a cross-domain retriever. (2) Fine-tuning: A label-free IB-optimized selector identifies minimal sufficient query-specific subgraphs. (3) Inferring: Retrieved subgraphs are transformed into path-aware in-context prompts to guide multi-hop reasoning. Click "$\blacktriangleright$" to navigate.
  • Figure 3: Cross-domain generalizability analysis.
  • Figure 4: Retrieving time efficiency and effectiveness.
  • Figure 5: Visualizations of the retrieved subgraph.
  • ...and 5 more figures

Theorems & Definitions (4)

  • proposition 1: Multi-domain logical expressivity of query-conditioned GFM
  • proposition 2: Error Bound of $\mathcal{L}_{\textnormal{LIB}}$ Approximation
  • proposition 3: Lower Bound of $I(\mathbf{q};\mathcal{G}_\mathbf{q})$
  • proposition 4: Upper Bound of $I(\mathcal{G};\mathcal{G}_\mathbf{q})$