Table of Contents
Fetching ...

PathMind: A Retrieve-Prioritize-Reason Framework for Knowledge Graph Reasoning with Large Language Models

Yu Liu, Xixun Lin, Yanmin Shang, Yangxi Li, Shi Wang, Yanan Cao

TL;DR

PathMind tackles knowledge graph reasoning with large language models by introducing a Retrieve-Prioritize-Reason framework that first extracts a query subgraph, then semantically prioritizes important multi-hop reasoning paths, and finally guides LLMs through a dual-phase training regime (instruction tuning and path-wise preference alignment). This approach reduces noise from irrelevant paths and lowers computational cost, while maintaining or improving reasoning accuracy on WebQSP and CWQ. Through extensive ablations and cross-LLM experiments, PathMind demonstrates robust gains in Hits@1 and F1, especially on complex, multi-hop queries, and proves effective across different model backbones. The work offers a practical pathway to scalable, interpretable KGR by combining graph representations, learned path prioritization, and targeted LLM training.

Abstract

Knowledge graph reasoning (KGR) is the task of inferring new knowledge by performing logical deductions on knowledge graphs. Recently, large language models (LLMs) have demonstrated remarkable performance in complex reasoning tasks. Despite promising success, current LLM-based KGR methods still face two critical limitations. First, existing methods often extract reasoning paths indiscriminately, without assessing their different importance, which may introduce irrelevant noise that misleads LLMs. Second, while many methods leverage LLMs to dynamically explore potential reasoning paths, they require high retrieval demands and frequent LLM calls. To address these limitations, we propose PathMind, a novel framework designed to enhance faithful and interpretable reasoning by selectively guiding LLMs with important reasoning paths. Specifically, PathMind follows a "Retrieve-Prioritize-Reason" paradigm. First, it retrieves a query subgraph from KG through the retrieval module. Next, it introduces a path prioritization mechanism that identifies important reasoning paths using a semantic-aware path priority function, which simultaneously considers the accumulative cost and the estimated future cost for reaching the target. Finally, PathMind generates accurate and logically consistent responses via a dual-phase training strategy, including task-specific instruction tuning and path-wise preference alignment. Extensive experiments on benchmark datasets demonstrate that PathMind consistently outperforms competitive baselines, particularly on complex reasoning tasks with fewer input tokens, by identifying essential reasoning paths.

PathMind: A Retrieve-Prioritize-Reason Framework for Knowledge Graph Reasoning with Large Language Models

TL;DR

PathMind tackles knowledge graph reasoning with large language models by introducing a Retrieve-Prioritize-Reason framework that first extracts a query subgraph, then semantically prioritizes important multi-hop reasoning paths, and finally guides LLMs through a dual-phase training regime (instruction tuning and path-wise preference alignment). This approach reduces noise from irrelevant paths and lowers computational cost, while maintaining or improving reasoning accuracy on WebQSP and CWQ. Through extensive ablations and cross-LLM experiments, PathMind demonstrates robust gains in Hits@1 and F1, especially on complex, multi-hop queries, and proves effective across different model backbones. The work offers a practical pathway to scalable, interpretable KGR by combining graph representations, learned path prioritization, and targeted LLM training.

Abstract

Knowledge graph reasoning (KGR) is the task of inferring new knowledge by performing logical deductions on knowledge graphs. Recently, large language models (LLMs) have demonstrated remarkable performance in complex reasoning tasks. Despite promising success, current LLM-based KGR methods still face two critical limitations. First, existing methods often extract reasoning paths indiscriminately, without assessing their different importance, which may introduce irrelevant noise that misleads LLMs. Second, while many methods leverage LLMs to dynamically explore potential reasoning paths, they require high retrieval demands and frequent LLM calls. To address these limitations, we propose PathMind, a novel framework designed to enhance faithful and interpretable reasoning by selectively guiding LLMs with important reasoning paths. Specifically, PathMind follows a "Retrieve-Prioritize-Reason" paradigm. First, it retrieves a query subgraph from KG through the retrieval module. Next, it introduces a path prioritization mechanism that identifies important reasoning paths using a semantic-aware path priority function, which simultaneously considers the accumulative cost and the estimated future cost for reaching the target. Finally, PathMind generates accurate and logically consistent responses via a dual-phase training strategy, including task-specific instruction tuning and path-wise preference alignment. Extensive experiments on benchmark datasets demonstrate that PathMind consistently outperforms competitive baselines, particularly on complex reasoning tasks with fewer input tokens, by identifying essential reasoning paths.

Paper Structure

This paper contains 39 sections, 8 equations, 7 figures, 8 tables.

Figures (7)

  • Figure 1: Illustration of LLM-based KGR methods. (a) Retrieval-augmented methods retrieve relevant information from KGs for LLMs. (b) Synergy-augmented methods integrate KGs and LLMs through iterative interaction.
  • Figure 2: The overall framework of our PathMind. (a) Subgraph Retrieval extracts the query subgraph from the KG and encodes it into graph representations. (b) Path Prioritization identifies important reasoning paths according to a path priority function. (c) Knowledge Reasoning enhances LLM reasoning via task-specific instruction tuning and path-wise preference alignment.
  • Figure 3: Illustration of the path priority function on the KG, where the current entity $e$ is evaluated based on the accumulative cost $d(q, e)$ and the estimated future cost $f(e, a)$.
  • Figure 4: Effect on varying numbers of node selection.
  • Figure 5: Illustrations of interpretable reasoning of PathMind on CWQ (solid lines indicate retrieved important paths).
  • ...and 2 more figures