Table of Contents
Fetching ...

Long-range Meta-path Search on Large-scale Heterogeneous Graphs

Chao Li, Zijie Guo, Qiuting He, Hao Xu, Kun He

TL;DR

The paper tackles leveraging long-range dependency in large-scale heterogeneous graphs by introducing LMSPS, which uses a progressive sampling strategy to shrink an explosion of potential meta-paths from $K$ to a compact set and a sampling-based evaluation to select $M$ effective meta-paths. The final model employs an MLP-based target network that concatenates representations from the chosen meta-paths, addressing both computational cost and over-smoothing. Empirical results on nine heterogeneous datasets, including the large OGBN-MAG, show LMSPS outperforms state-of-the-art baselines, with notable gains in sparse, long-range scenarios (e.g., LMSPS achieves 54.83% test accuracy on OGBN-MAG versus 51.45% for the best competitor). The approach demonstrates that a data-driven, limited set of meta-paths can generalize across HGNNs, offering a practical path to exploiting long-range information in heterogeneous graphs.

Abstract

Utilizing long-range dependency, a concept extensively studied in homogeneous graphs, remains underexplored in heterogeneous graphs, especially on large ones, posing two significant challenges: Reducing computational costs while maximizing effective information utilization in the presence of heterogeneity, and overcoming the over-smoothing issue in graph neural networks. To address this gap, we investigate the importance of different meta-paths and introduce an automatic framework for utilizing long-range dependency on heterogeneous graphs, denoted as Long-range Meta-path Search through Progressive Sampling (LMSPS). Specifically, we develop a search space with all meta-paths related to the target node type. By employing a progressive sampling algorithm, LMSPS dynamically shrinks the search space with hop-independent time complexity. Through a sampling evaluation strategy, LMSPS conducts a specialized and effective meta-path selection, leading to retraining with only effective meta-paths, thus mitigating costs and over-smoothing. Extensive experiments across diverse heterogeneous datasets validate LMSPS's capability in discovering effective long-range meta-paths, surpassing state-of-the-art methods. Our code is available at https://github.com/JHL-HUST/LMSPS.

Long-range Meta-path Search on Large-scale Heterogeneous Graphs

TL;DR

The paper tackles leveraging long-range dependency in large-scale heterogeneous graphs by introducing LMSPS, which uses a progressive sampling strategy to shrink an explosion of potential meta-paths from to a compact set and a sampling-based evaluation to select effective meta-paths. The final model employs an MLP-based target network that concatenates representations from the chosen meta-paths, addressing both computational cost and over-smoothing. Empirical results on nine heterogeneous datasets, including the large OGBN-MAG, show LMSPS outperforms state-of-the-art baselines, with notable gains in sparse, long-range scenarios (e.g., LMSPS achieves 54.83% test accuracy on OGBN-MAG versus 51.45% for the best competitor). The approach demonstrates that a data-driven, limited set of meta-paths can generalize across HGNNs, offering a practical path to exploiting long-range information in heterogeneous graphs.

Abstract

Utilizing long-range dependency, a concept extensively studied in homogeneous graphs, remains underexplored in heterogeneous graphs, especially on large ones, posing two significant challenges: Reducing computational costs while maximizing effective information utilization in the presence of heterogeneity, and overcoming the over-smoothing issue in graph neural networks. To address this gap, we investigate the importance of different meta-paths and introduce an automatic framework for utilizing long-range dependency on heterogeneous graphs, denoted as Long-range Meta-path Search through Progressive Sampling (LMSPS). Specifically, we develop a search space with all meta-paths related to the target node type. By employing a progressive sampling algorithm, LMSPS dynamically shrinks the search space with hop-independent time complexity. Through a sampling evaluation strategy, LMSPS conducts a specialized and effective meta-path selection, leading to retraining with only effective meta-paths, thus mitigating costs and over-smoothing. Extensive experiments across diverse heterogeneous datasets validate LMSPS's capability in discovering effective long-range meta-paths, surpassing state-of-the-art methods. Our code is available at https://github.com/JHL-HUST/LMSPS.
Paper Structure (37 sections, 9 equations, 6 figures, 11 tables, 1 algorithm)

This paper contains 37 sections, 9 equations, 6 figures, 11 tables, 1 algorithm.

Figures (6)

  • Figure 1: Analysis of the importance of different meta-paths. (a) illustrates the results after removing a single meta-path on DBLP. (b) shows the performance of utilizing a single meta-path on DBLP (c) illustrates the performance after removing a part of meta-paths on ACM.
  • Figure 2: The overall framework of LMSPS. Based on the progressive sampling and sampling evaluation in the search stage, the training stage employs $M$ effective meta-paths instead of the full $K$ target-node-related meta-paths. It exhibits aggregation of meta-paths with maximum hop $2$, i.e., $0$, $1$, and $2$-hop meta-paths. The weight updates of feature projection are not shown for ease of illustration.
  • Figure 3: Illustration of (a) performance, (b) memory cost, (c) average training time of Simple-HGN, SeHGNN, and LMSPS relative to the maximum hop or layer on DBLP. The gray dotted line in (a) indicates the number of target-node-related meta-paths under different maximum hops, which is exponential.
  • Figure 4: Micro-F1 scores, time consumption, and parameters of various HGNNs on DBLP and ACM. GTN has a large time consumption and parameters. We ignore it for ease of illustration.
  • Figure 5: Micro-F1 with respect to different hyper-parameter $M$ on DBLP, IMDB, and ACM.
  • ...and 1 more figures