Table of Contents
Fetching ...

EvoPath: Evolutionary Meta-path Discovery with Large Language Models for Complex Heterogeneous Information Networks

Shixuan Liu, Haoxiang Cheng, Yunfei Wang, Yue He, Changjun Fan, Zhong Liu

TL;DR

EvoPath addresses the challenge of discovering high-quality meta-paths in complex heterogeneous information networks by coupling Large Language Models with an in-context learning framework and an evolutionary-inspired loop. The framework comprises five components—meta-path sampler, atom selector, prioritized replay buffer, LLM-based meta-path generator, and meta-path cleaner—to generate, evaluate, and refine meta-paths with minimal HIN facts. Extensive experiments on three large real-world HINs demonstrate EvoPath's superior performance in knowledge base completion and link prediction, along with robust inductive capabilities and insightful meta-path analyses; ablation studies validate the contribution of each module and the method’s robustness to LLM choice. The work highlights the practical potential of explainable, efficient LLM-driven meta-path discovery for complex HIN reasoning tasks and sets the stage for further integration of chain-of-thought reasoning and rule generation in offline knowledge systems.

Abstract

Heterogeneous Information Networks (HINs) encapsulate diverse entity and relation types, with meta-paths providing essential meta-level semantics for knowledge reasoning, although their utility is constrained by discovery challenges. While Large Language Models (LLMs) offer new prospects for meta-path discovery due to their extensive knowledge encoding and efficiency, their adaptation faces challenges such as corpora bias, lexical discrepancies, and hallucination. This paper pioneers the mitigation of these challenges by presenting EvoPath, an innovative framework that leverages LLMs to efficiently identify high-quality meta-paths. EvoPath is carefully designed, with each component aimed at addressing issues that could lead to potential knowledge conflicts. With a minimal subset of HIN facts, EvoPath iteratively generates and evolves meta-paths by dynamically replaying meta-paths in the buffer with prioritization based on their scores. Comprehensive experiments on three large, complex HINs with hundreds of relations demonstrate that our framework, EvoPath, enables LLMs to generate high-quality meta-paths through effective prompting, confirming its superior performance in HIN reasoning tasks. Further ablation studies validate the effectiveness of each module within the framework.

EvoPath: Evolutionary Meta-path Discovery with Large Language Models for Complex Heterogeneous Information Networks

TL;DR

EvoPath addresses the challenge of discovering high-quality meta-paths in complex heterogeneous information networks by coupling Large Language Models with an in-context learning framework and an evolutionary-inspired loop. The framework comprises five components—meta-path sampler, atom selector, prioritized replay buffer, LLM-based meta-path generator, and meta-path cleaner—to generate, evaluate, and refine meta-paths with minimal HIN facts. Extensive experiments on three large real-world HINs demonstrate EvoPath's superior performance in knowledge base completion and link prediction, along with robust inductive capabilities and insightful meta-path analyses; ablation studies validate the contribution of each module and the method’s robustness to LLM choice. The work highlights the practical potential of explainable, efficient LLM-driven meta-path discovery for complex HIN reasoning tasks and sets the stage for further integration of chain-of-thought reasoning and rule generation in offline knowledge systems.

Abstract

Heterogeneous Information Networks (HINs) encapsulate diverse entity and relation types, with meta-paths providing essential meta-level semantics for knowledge reasoning, although their utility is constrained by discovery challenges. While Large Language Models (LLMs) offer new prospects for meta-path discovery due to their extensive knowledge encoding and efficiency, their adaptation faces challenges such as corpora bias, lexical discrepancies, and hallucination. This paper pioneers the mitigation of these challenges by presenting EvoPath, an innovative framework that leverages LLMs to efficiently identify high-quality meta-paths. EvoPath is carefully designed, with each component aimed at addressing issues that could lead to potential knowledge conflicts. With a minimal subset of HIN facts, EvoPath iteratively generates and evolves meta-paths by dynamically replaying meta-paths in the buffer with prioritization based on their scores. Comprehensive experiments on three large, complex HINs with hundreds of relations demonstrate that our framework, EvoPath, enables LLMs to generate high-quality meta-paths through effective prompting, confirming its superior performance in HIN reasoning tasks. Further ablation studies validate the effectiveness of each module within the framework.
Paper Structure (24 sections, 3 equations, 4 figures, 10 tables)

This paper contains 24 sections, 3 equations, 4 figures, 10 tables.

Figures (4)

  • Figure 1: While meta-paths offer effective and explainable reasoning, their application is limited by the difficulties inherent in their discovery. Although LLMs present an opportunity for discovering meta-paths, integrating them faces the challenge of potential knowledge conflicts.
  • Figure 2: Given a HIN, the meta-path sampler initially generates meta-path examples from path instances sampled via random walks, which are then processed by the atom selector and prioritized replay buffer. The atom selector extracts the taxonomy of entity type and relation from example meta-paths and expands them using lexical similarity to construct candidate atoms. Meanwhile, the prioritized replay buffer calculates plausibility scores for meta-path examples to determine their sampling probabilities. Subsequently, by integrating the sampled paths and candidate atoms into prompts, the meta-path generator establishes meta-paths with LLMs. Ultimately, the meta-path cleaner identifies and corrects errors, considering synonyms where possible, before plugging the corrected meta-paths into the buffer. A cyclical evolutionary process encompassing the replay buffer, meta-path generator, and cleaner is then initiated, progressively refining the meta-paths.
  • Figure 3: Inductive link prediction results for EvoPath (Purple) and RotatE (Red). The horizontal axes denote node removal rate, and the shaded areas represent confidence intervals across five runs.
  • Figure 4: Average link prediction performance (shown in ROC-AUC) across different prompt designs over five runs.