Long-range Meta-path Search on Large-scale Heterogeneous Graphs

Chao Li; Zijie Guo; Qiuting He; Hao Xu; Kun He

Long-range Meta-path Search on Large-scale Heterogeneous Graphs

Chao Li, Zijie Guo, Qiuting He, Hao Xu, Kun He

TL;DR

The paper tackles leveraging long-range dependency in large-scale heterogeneous graphs by introducing LMSPS, which uses a progressive sampling strategy to shrink an explosion of potential meta-paths from $K$ to a compact set and a sampling-based evaluation to select $M$ effective meta-paths. The final model employs an MLP-based target network that concatenates representations from the chosen meta-paths, addressing both computational cost and over-smoothing. Empirical results on nine heterogeneous datasets, including the large OGBN-MAG, show LMSPS outperforms state-of-the-art baselines, with notable gains in sparse, long-range scenarios (e.g., LMSPS achieves 54.83% test accuracy on OGBN-MAG versus 51.45% for the best competitor). The approach demonstrates that a data-driven, limited set of meta-paths can generalize across HGNNs, offering a practical path to exploiting long-range information in heterogeneous graphs.

Abstract

Utilizing long-range dependency, a concept extensively studied in homogeneous graphs, remains underexplored in heterogeneous graphs, especially on large ones, posing two significant challenges: Reducing computational costs while maximizing effective information utilization in the presence of heterogeneity, and overcoming the over-smoothing issue in graph neural networks. To address this gap, we investigate the importance of different meta-paths and introduce an automatic framework for utilizing long-range dependency on heterogeneous graphs, denoted as Long-range Meta-path Search through Progressive Sampling (LMSPS). Specifically, we develop a search space with all meta-paths related to the target node type. By employing a progressive sampling algorithm, LMSPS dynamically shrinks the search space with hop-independent time complexity. Through a sampling evaluation strategy, LMSPS conducts a specialized and effective meta-path selection, leading to retraining with only effective meta-paths, thus mitigating costs and over-smoothing. Extensive experiments across diverse heterogeneous datasets validate LMSPS's capability in discovering effective long-range meta-paths, surpassing state-of-the-art methods. Our code is available at https://github.com/JHL-HUST/LMSPS.

Long-range Meta-path Search on Large-scale Heterogeneous Graphs

TL;DR

to a compact set and a sampling-based evaluation to select

effective meta-paths. The final model employs an MLP-based target network that concatenates representations from the chosen meta-paths, addressing both computational cost and over-smoothing. Empirical results on nine heterogeneous datasets, including the large OGBN-MAG, show LMSPS outperforms state-of-the-art baselines, with notable gains in sparse, long-range scenarios (e.g., LMSPS achieves 54.83% test accuracy on OGBN-MAG versus 51.45% for the best competitor). The approach demonstrates that a data-driven, limited set of meta-paths can generalize across HGNNs, offering a practical path to exploiting long-range information in heterogeneous graphs.

Abstract

Paper Structure (37 sections, 9 equations, 6 figures, 11 tables, 1 algorithm)

This paper contains 37 sections, 9 equations, 6 figures, 11 tables, 1 algorithm.

Introduction
Preliminaries
Related Works
Motivation of Long-range Meta-path Search
The Proposed Method
Progressive Sampling Search
Sampling Evaluation
Discussion on Differences with Prior Works
Experiments and Analysis
Datasets and Baselines
Performance Analysis
Analysis on Large Maximum Hops
Effectiveness of the Search Algorithm and Searched Meta-paths
Necessity of Long-range Dependency
Ablation Study
...and 22 more sections

Figures (6)

Figure 1: Analysis of the importance of different meta-paths. (a) illustrates the results after removing a single meta-path on DBLP. (b) shows the performance of utilizing a single meta-path on DBLP (c) illustrates the performance after removing a part of meta-paths on ACM.
Figure 2: The overall framework of LMSPS. Based on the progressive sampling and sampling evaluation in the search stage, the training stage employs $M$ effective meta-paths instead of the full $K$ target-node-related meta-paths. It exhibits aggregation of meta-paths with maximum hop $2$, i.e., $0$, $1$, and $2$-hop meta-paths. The weight updates of feature projection are not shown for ease of illustration.
Figure 3: Illustration of (a) performance, (b) memory cost, (c) average training time of Simple-HGN, SeHGNN, and LMSPS relative to the maximum hop or layer on DBLP. The gray dotted line in (a) indicates the number of target-node-related meta-paths under different maximum hops, which is exponential.
Figure 4: Micro-F1 scores, time consumption, and parameters of various HGNNs on DBLP and ACM. GTN has a large time consumption and parameters. We ignore it for ease of illustration.
Figure 5: Micro-F1 with respect to different hyper-parameter $M$ on DBLP, IMDB, and ACM.
...and 1 more figures

Long-range Meta-path Search on Large-scale Heterogeneous Graphs

TL;DR

Abstract

Long-range Meta-path Search on Large-scale Heterogeneous Graphs

Authors

TL;DR

Abstract

Table of Contents

Figures (6)