Long-range Meta-path Search on Large-scale Heterogeneous Graphs
Chao Li, Zijie Guo, Qiuting He, Hao Xu, Kun He
TL;DR
The paper tackles leveraging long-range dependency in large-scale heterogeneous graphs by introducing LMSPS, which uses a progressive sampling strategy to shrink an explosion of potential meta-paths from $K$ to a compact set and a sampling-based evaluation to select $M$ effective meta-paths. The final model employs an MLP-based target network that concatenates representations from the chosen meta-paths, addressing both computational cost and over-smoothing. Empirical results on nine heterogeneous datasets, including the large OGBN-MAG, show LMSPS outperforms state-of-the-art baselines, with notable gains in sparse, long-range scenarios (e.g., LMSPS achieves 54.83% test accuracy on OGBN-MAG versus 51.45% for the best competitor). The approach demonstrates that a data-driven, limited set of meta-paths can generalize across HGNNs, offering a practical path to exploiting long-range information in heterogeneous graphs.
Abstract
Utilizing long-range dependency, a concept extensively studied in homogeneous graphs, remains underexplored in heterogeneous graphs, especially on large ones, posing two significant challenges: Reducing computational costs while maximizing effective information utilization in the presence of heterogeneity, and overcoming the over-smoothing issue in graph neural networks. To address this gap, we investigate the importance of different meta-paths and introduce an automatic framework for utilizing long-range dependency on heterogeneous graphs, denoted as Long-range Meta-path Search through Progressive Sampling (LMSPS). Specifically, we develop a search space with all meta-paths related to the target node type. By employing a progressive sampling algorithm, LMSPS dynamically shrinks the search space with hop-independent time complexity. Through a sampling evaluation strategy, LMSPS conducts a specialized and effective meta-path selection, leading to retraining with only effective meta-paths, thus mitigating costs and over-smoothing. Extensive experiments across diverse heterogeneous datasets validate LMSPS's capability in discovering effective long-range meta-paths, surpassing state-of-the-art methods. Our code is available at https://github.com/JHL-HUST/LMSPS.
