Table of Contents
Fetching ...

HetFS: A Method for Fast Similarity Search with Ad-hoc Meta-paths on Heterogeneous Information Networks

Xuqi Mao, Zhenyi Chen, Zhenying He, Yinan Jing, Kai Zhang, X. Sean Wang

TL;DR

HetFS addresses fast, ad-hoc meta-path similarity search on heterogeneous information networks by integrating content, node centrality, edge contributions, and structural information into a SimRank-based framework. It projects heterogeneous content into a common latent space, weights nodes and edges by type-aware centrality and contribution, and uses a path-enumeration strategy with tours to compute similarity under user-specified meta-paths. The method achieves high accuracy and fast response times, outperforming HGNNs and path-based methods on link prediction, clustering, and classification while enabling rapid ad-hoc query switching. This work advances practical heterogeneous graph mining by delivering flexible, content-aware, meta-path-constrained similarity with scalable performance.

Abstract

Numerous real-world information networks form Heterogeneous Information Networks (HINs) with diverse objects and relations represented as nodes and edges in heterogeneous graphs. Similarity between nodes quantifies how closely two nodes resemble each other, mainly depending on the similarity of the nodes they are connected to, recursively. Users may be interested in only specific types of connections in the similarity definition, represented as meta-paths, i.e., a sequence of node and edge types. Existing Heterogeneous Graph Neural Network (HGNN)-based similarity search methods may accommodate meta-paths, but require retraining for different meta-paths. Conversely, existing path-based similarity search methods may switch flexibly between meta-paths but often suffer from lower accuracy, as they rely solely on path information. This paper proposes HetFS, a Fast Similarity method for ad-hoc queries with user-given meta-paths on Heterogeneous information networks. HetFS provides similarity results based on path information that satisfies the meta-path restriction, as well as node content. Extensive experiments demonstrate the effectiveness and efficiency of HetFS in addressing ad-hoc queries, outperforming state-of-the-art HGNNs and path-based approaches, and showing strong performance in downstream applications, including link prediction, node classification, and clustering.

HetFS: A Method for Fast Similarity Search with Ad-hoc Meta-paths on Heterogeneous Information Networks

TL;DR

HetFS addresses fast, ad-hoc meta-path similarity search on heterogeneous information networks by integrating content, node centrality, edge contributions, and structural information into a SimRank-based framework. It projects heterogeneous content into a common latent space, weights nodes and edges by type-aware centrality and contribution, and uses a path-enumeration strategy with tours to compute similarity under user-specified meta-paths. The method achieves high accuracy and fast response times, outperforming HGNNs and path-based methods on link prediction, clustering, and classification while enabling rapid ad-hoc query switching. This work advances practical heterogeneous graph mining by delivering flexible, content-aware, meta-path-constrained similarity with scalable performance.

Abstract

Numerous real-world information networks form Heterogeneous Information Networks (HINs) with diverse objects and relations represented as nodes and edges in heterogeneous graphs. Similarity between nodes quantifies how closely two nodes resemble each other, mainly depending on the similarity of the nodes they are connected to, recursively. Users may be interested in only specific types of connections in the similarity definition, represented as meta-paths, i.e., a sequence of node and edge types. Existing Heterogeneous Graph Neural Network (HGNN)-based similarity search methods may accommodate meta-paths, but require retraining for different meta-paths. Conversely, existing path-based similarity search methods may switch flexibly between meta-paths but often suffer from lower accuracy, as they rely solely on path information. This paper proposes HetFS, a Fast Similarity method for ad-hoc queries with user-given meta-paths on Heterogeneous information networks. HetFS provides similarity results based on path information that satisfies the meta-path restriction, as well as node content. Extensive experiments demonstrate the effectiveness and efficiency of HetFS in addressing ad-hoc queries, outperforming state-of-the-art HGNNs and path-based approaches, and showing strong performance in downstream applications, including link prediction, node classification, and clustering.

Paper Structure

This paper contains 24 sections, 1 theorem, 11 equations, 6 figures, 8 tables, 2 algorithms.

Key Result

Theorem 1

In HetFS, the average time required to compute a similarity score between a specific node and other involving nodes in a user query with ad-hoc meta-paths is bounded by $\mathcal{O}(\overline{d}^lm)$. The total time for processing the query is bounded by $\mathcal{O}(\overline{d}^lm + \log k)$.

Figures (6)

  • Figure 1: Examples of HINs. (a) depicts a movie HIN comprising objects of different types such as actors (A), movies (M), and directors (D), each carrying varied properties including actor ID, actor name, author order, movie ID, movie name, director ID, and director name. These objects are interconnected through different types of relations, such as acting, acted-by, directed-by, and directing. "MAM" (Movie-Actor-Movie) and "DMD" (Director-Movie-Director) are two meta-path examples of it. (b) showcases objects and relations within an academic network, with "PAP" (Paper-Author-Paper) and "PVP" (Paper-Venue-Paper) as two example meta-paths.
  • Figure 2: The overall architecture of HetFS. HetFS first projects heterogeneous content information, e.g., textual data, from each node into a unified domain via a transformation function. Subsequently, the content information is integrated with node centrality, edge contribution, and structural topology to form the ultimate similarity computation.
  • Figure 3: The contribution graph of movie network.
  • Figure 4: A running example of movie network.
  • Figure 5: The time cost under different meta-paths on "DBLP" and "IMDB" datasets.
  • ...and 1 more figures

Theorems & Definitions (4)

  • Definition 1: Information Network
  • Definition 2: Meta-Path
  • Theorem 1
  • proof