EvoPath: Evolutionary Meta-path Discovery with Large Language Models for Complex Heterogeneous Information Networks
Shixuan Liu, Haoxiang Cheng, Yunfei Wang, Yue He, Changjun Fan, Zhong Liu
TL;DR
EvoPath addresses the challenge of discovering high-quality meta-paths in complex heterogeneous information networks by coupling Large Language Models with an in-context learning framework and an evolutionary-inspired loop. The framework comprises five components—meta-path sampler, atom selector, prioritized replay buffer, LLM-based meta-path generator, and meta-path cleaner—to generate, evaluate, and refine meta-paths with minimal HIN facts. Extensive experiments on three large real-world HINs demonstrate EvoPath's superior performance in knowledge base completion and link prediction, along with robust inductive capabilities and insightful meta-path analyses; ablation studies validate the contribution of each module and the method’s robustness to LLM choice. The work highlights the practical potential of explainable, efficient LLM-driven meta-path discovery for complex HIN reasoning tasks and sets the stage for further integration of chain-of-thought reasoning and rule generation in offline knowledge systems.
Abstract
Heterogeneous Information Networks (HINs) encapsulate diverse entity and relation types, with meta-paths providing essential meta-level semantics for knowledge reasoning, although their utility is constrained by discovery challenges. While Large Language Models (LLMs) offer new prospects for meta-path discovery due to their extensive knowledge encoding and efficiency, their adaptation faces challenges such as corpora bias, lexical discrepancies, and hallucination. This paper pioneers the mitigation of these challenges by presenting EvoPath, an innovative framework that leverages LLMs to efficiently identify high-quality meta-paths. EvoPath is carefully designed, with each component aimed at addressing issues that could lead to potential knowledge conflicts. With a minimal subset of HIN facts, EvoPath iteratively generates and evolves meta-paths by dynamically replaying meta-paths in the buffer with prioritization based on their scores. Comprehensive experiments on three large, complex HINs with hundreds of relations demonstrate that our framework, EvoPath, enables LLMs to generate high-quality meta-paths through effective prompting, confirming its superior performance in HIN reasoning tasks. Further ablation studies validate the effectiveness of each module within the framework.
