Table of Contents
Fetching ...

Inductive Meta-path Learning for Schema-complex Heterogeneous Information Networks

Shixuan Liu, Changjun Fan, Kewei Cheng, Yunfei Wang, Peng Cui, Yizhou Sun, Zhong Liu

TL;DR

SchemaWalk reframes meta-path learning for schema-complex heterogeneous information networks as an inductive, schema-level problem, using a reinforcement-learning path-finding agent on the schema graph and learning schema-level representations to avoid enumeration of path instances. By combining an encoder-decoder policy network with a reward that reflects meta-path coverage and confidence, SchemaWalk discovers high-quality meta-paths for multiple relations, including unseen ones, and demonstrates strong performance in multi-relational inductive and transductive KB reasoning as well as per-relation experiments on KBs and a schema-simple HIN. The approach provides improved explainability and efficiency over instance-based or embedding-centric methods and scales to large knowledge bases, with robust performance under partial evidence and meaningful meta-path extraction. The work suggests practical impact for KB reasoning, link prediction, and downstream tasks, and points to future extensions such as task-specific reward design, faster evaluation of meta-paths, and temporal or dynamic HINs.

Abstract

Heterogeneous Information Networks (HINs) are information networks with multiple types of nodes and edges. The concept of meta-path, i.e., a sequence of entity types and relation types connecting two entities, is proposed to provide the meta-level explainable semantics for various HIN tasks. Traditionally, meta-paths are primarily used for schema-simple HINs, e.g., bibliographic networks with only a few entity types, where meta-paths are often enumerated with domain knowledge. However, the adoption of meta-paths for schema-complex HINs, such as knowledge bases (KBs) with hundreds of entity and relation types, has been limited due to the computational complexity associated with meta-path enumeration. Additionally, effectively assessing meta-paths requires enumerating relevant path instances, which adds further complexity to the meta-path learning process. To address these challenges, we propose SchemaWalk, an inductive meta-path learning framework for schema-complex HINs. We represent meta-paths with schema-level representations to support the learning of the scores of meta-paths for varying relations, mitigating the need of exhaustive path instance enumeration for each relation. Further, we design a reinforcement-learning based path-finding agent, which directly navigates the network schema (i.e., schema graph) to learn policies for establishing meta-paths with high coverage and confidence for multiple relations. Extensive experiments on real data sets demonstrate the effectiveness of our proposed paradigm.

Inductive Meta-path Learning for Schema-complex Heterogeneous Information Networks

TL;DR

SchemaWalk reframes meta-path learning for schema-complex heterogeneous information networks as an inductive, schema-level problem, using a reinforcement-learning path-finding agent on the schema graph and learning schema-level representations to avoid enumeration of path instances. By combining an encoder-decoder policy network with a reward that reflects meta-path coverage and confidence, SchemaWalk discovers high-quality meta-paths for multiple relations, including unseen ones, and demonstrates strong performance in multi-relational inductive and transductive KB reasoning as well as per-relation experiments on KBs and a schema-simple HIN. The approach provides improved explainability and efficiency over instance-based or embedding-centric methods and scales to large knowledge bases, with robust performance under partial evidence and meaningful meta-path extraction. The work suggests practical impact for KB reasoning, link prediction, and downstream tasks, and points to future extensions such as task-specific reward design, faster evaluation of meta-paths, and temporal or dynamic HINs.

Abstract

Heterogeneous Information Networks (HINs) are information networks with multiple types of nodes and edges. The concept of meta-path, i.e., a sequence of entity types and relation types connecting two entities, is proposed to provide the meta-level explainable semantics for various HIN tasks. Traditionally, meta-paths are primarily used for schema-simple HINs, e.g., bibliographic networks with only a few entity types, where meta-paths are often enumerated with domain knowledge. However, the adoption of meta-paths for schema-complex HINs, such as knowledge bases (KBs) with hundreds of entity and relation types, has been limited due to the computational complexity associated with meta-path enumeration. Additionally, effectively assessing meta-paths requires enumerating relevant path instances, which adds further complexity to the meta-path learning process. To address these challenges, we propose SchemaWalk, an inductive meta-path learning framework for schema-complex HINs. We represent meta-paths with schema-level representations to support the learning of the scores of meta-paths for varying relations, mitigating the need of exhaustive path instance enumeration for each relation. Further, we design a reinforcement-learning based path-finding agent, which directly navigates the network schema (i.e., schema graph) to learn policies for establishing meta-paths with high coverage and confidence for multiple relations. Extensive experiments on real data sets demonstrate the effectiveness of our proposed paradigm.
Paper Structure (30 sections, 9 equations, 7 figures, 9 tables)

This paper contains 30 sections, 9 equations, 7 figures, 9 tables.

Figures (7)

  • Figure 1: An illustration of a two-view heterogeneous information network. Top: a schema graph $\mathcal{G}_S$ where nodes represent entity types. Bottom: an instance graph $\mathcal{G}_I$ where nodes represent specific entities. The entities in $\mathcal{G}_I$ are linked to one or multiple entity types in $\mathcal{G}_S$. A path in the $\mathcal{G}_S$ (one of which is boldfaced in black) is referred to as a meta-path. To learn these meta-paths, evidence from corresponding path instances, marked in black bold line in $\mathcal{G}_I$, is typically necessary.
  • Figure 2: Overview of SchemaWalk. The Left part describes the interactions between the two views of HIN. The upper schema graph provides the learning environment where the agent is trained to navigate to the target type node based on a query, e.g. isCitizenOf(Person, Country)=?, and establishes meta-paths, e.g. $Person \xrightarrow{GraduatedFrom} University \xrightarrow{LocatedIn} Country$. The lower instance graph provides the rewards: the coverage and confidence of the discovered meta-path are calculated based on the instance paths satisfying the meta-path, e.g. $Marie Curie \xrightarrow{GraduatedFrom} Univ. of Paris \xrightarrow{LocatedIn} France$. Subsequently, the rewards would be utilized to train the agent. Note that the red arrows denote the discovered meta-path and related instance-paths, dashed arrows denote existing relations, and dotted arrows indicate the inverse ones. The Right part shows the detailed encoder-decoder based policy network architecture.
  • Figure 3: Three experimental settings adopted in this paper (exemplified using the relations and entity pairs in YAGO26K-906).
  • Figure 4: The performance difference of SchemaWalk between multi-relation transductive setting and multi-relation inductive setting.
  • Figure 5: Entity-level inductive experiment results for SchemaWalk (Blue) and RotatE (Red). The above/below three charts respectively show the ROC-AUC/AP values. The horizontal axes represent the node removal rate. The shadow area represents the confidence intervals over 5 runs.
  • ...and 2 more figures