PathMind: A Retrieve-Prioritize-Reason Framework for Knowledge Graph Reasoning with Large Language Models
Yu Liu, Xixun Lin, Yanmin Shang, Yangxi Li, Shi Wang, Yanan Cao
TL;DR
PathMind tackles knowledge graph reasoning with large language models by introducing a Retrieve-Prioritize-Reason framework that first extracts a query subgraph, then semantically prioritizes important multi-hop reasoning paths, and finally guides LLMs through a dual-phase training regime (instruction tuning and path-wise preference alignment). This approach reduces noise from irrelevant paths and lowers computational cost, while maintaining or improving reasoning accuracy on WebQSP and CWQ. Through extensive ablations and cross-LLM experiments, PathMind demonstrates robust gains in Hits@1 and F1, especially on complex, multi-hop queries, and proves effective across different model backbones. The work offers a practical pathway to scalable, interpretable KGR by combining graph representations, learned path prioritization, and targeted LLM training.
Abstract
Knowledge graph reasoning (KGR) is the task of inferring new knowledge by performing logical deductions on knowledge graphs. Recently, large language models (LLMs) have demonstrated remarkable performance in complex reasoning tasks. Despite promising success, current LLM-based KGR methods still face two critical limitations. First, existing methods often extract reasoning paths indiscriminately, without assessing their different importance, which may introduce irrelevant noise that misleads LLMs. Second, while many methods leverage LLMs to dynamically explore potential reasoning paths, they require high retrieval demands and frequent LLM calls. To address these limitations, we propose PathMind, a novel framework designed to enhance faithful and interpretable reasoning by selectively guiding LLMs with important reasoning paths. Specifically, PathMind follows a "Retrieve-Prioritize-Reason" paradigm. First, it retrieves a query subgraph from KG through the retrieval module. Next, it introduces a path prioritization mechanism that identifies important reasoning paths using a semantic-aware path priority function, which simultaneously considers the accumulative cost and the estimated future cost for reaching the target. Finally, PathMind generates accurate and logically consistent responses via a dual-phase training strategy, including task-specific instruction tuning and path-wise preference alignment. Extensive experiments on benchmark datasets demonstrate that PathMind consistently outperforms competitive baselines, particularly on complex reasoning tasks with fewer input tokens, by identifying essential reasoning paths.
