HetFS: A Method for Fast Similarity Search with Ad-hoc Meta-paths on Heterogeneous Information Networks
Xuqi Mao, Zhenyi Chen, Zhenying He, Yinan Jing, Kai Zhang, X. Sean Wang
TL;DR
HetFS addresses fast, ad-hoc meta-path similarity search on heterogeneous information networks by integrating content, node centrality, edge contributions, and structural information into a SimRank-based framework. It projects heterogeneous content into a common latent space, weights nodes and edges by type-aware centrality and contribution, and uses a path-enumeration strategy with tours to compute similarity under user-specified meta-paths. The method achieves high accuracy and fast response times, outperforming HGNNs and path-based methods on link prediction, clustering, and classification while enabling rapid ad-hoc query switching. This work advances practical heterogeneous graph mining by delivering flexible, content-aware, meta-path-constrained similarity with scalable performance.
Abstract
Numerous real-world information networks form Heterogeneous Information Networks (HINs) with diverse objects and relations represented as nodes and edges in heterogeneous graphs. Similarity between nodes quantifies how closely two nodes resemble each other, mainly depending on the similarity of the nodes they are connected to, recursively. Users may be interested in only specific types of connections in the similarity definition, represented as meta-paths, i.e., a sequence of node and edge types. Existing Heterogeneous Graph Neural Network (HGNN)-based similarity search methods may accommodate meta-paths, but require retraining for different meta-paths. Conversely, existing path-based similarity search methods may switch flexibly between meta-paths but often suffer from lower accuracy, as they rely solely on path information. This paper proposes HetFS, a Fast Similarity method for ad-hoc queries with user-given meta-paths on Heterogeneous information networks. HetFS provides similarity results based on path information that satisfies the meta-path restriction, as well as node content. Extensive experiments demonstrate the effectiveness and efficiency of HetFS in addressing ad-hoc queries, outperforming state-of-the-art HGNNs and path-based approaches, and showing strong performance in downstream applications, including link prediction, node classification, and clustering.
