Table of Contents
Fetching ...

Heterogeneous Graph Neural Network on Semantic Tree

Mingyu Guan, Jack W. Stokes, Qinlong Luo, Fuchen Liu, Purvanshi Mehta, Elnaz Nouri, Taesoo Kim

TL;DR

This work addresses the limitation of existing heterogeneous GNNs in modeling the hierarchical relationships among metapaths. It proposes HetTree, which constructs a semantic tree over metapaths and uses a novel subtree attention mechanism to encode these hierarchies, while performing offline feature aggregation and carefully matching per-metapath features with corresponding labels. Empirical results on open benchmarks and a real-world commercial email graph show HetTree achieves state-of-the-art performance and scalable efficiency, with ablations confirming the value of subtree attention and label usage. The approach offers a principled, scalable framework for leveraging structured metapath information in heterogeneous graphs, with potential for broader application to large-scale, multi-relational datasets.

Abstract

The recent past has seen an increasing interest in Heterogeneous Graph Neural Networks (HGNNs), since many real-world graphs are heterogeneous in nature, from citation graphs to email graphs. However, existing methods ignore a tree hierarchy among metapaths, naturally constituted by different node types and relation types. In this paper, we present HetTree, a novel HGNN that models both the graph structure and heterogeneous aspects in a scalable and effective manner. Specifically, HetTree builds a semantic tree data structure to capture the hierarchy among metapaths. To effectively encode the semantic tree, HetTree uses a novel subtree attention mechanism to emphasize metapaths that are more helpful in encoding parent-child relationships. Moreover, HetTree proposes carefully matching pre-computed features and labels correspondingly, constituting a complete metapath representation. Our evaluation of HetTree on a variety of real-world datasets demonstrates that it outperforms all existing baselines on open benchmarks and efficiently scales to large real-world graphs with millions of nodes and edges.

Heterogeneous Graph Neural Network on Semantic Tree

TL;DR

This work addresses the limitation of existing heterogeneous GNNs in modeling the hierarchical relationships among metapaths. It proposes HetTree, which constructs a semantic tree over metapaths and uses a novel subtree attention mechanism to encode these hierarchies, while performing offline feature aggregation and carefully matching per-metapath features with corresponding labels. Empirical results on open benchmarks and a real-world commercial email graph show HetTree achieves state-of-the-art performance and scalable efficiency, with ablations confirming the value of subtree attention and label usage. The approach offers a principled, scalable framework for leveraging structured metapath information in heterogeneous graphs, with potential for broader application to large-scale, multi-relational datasets.

Abstract

The recent past has seen an increasing interest in Heterogeneous Graph Neural Networks (HGNNs), since many real-world graphs are heterogeneous in nature, from citation graphs to email graphs. However, existing methods ignore a tree hierarchy among metapaths, naturally constituted by different node types and relation types. In this paper, we present HetTree, a novel HGNN that models both the graph structure and heterogeneous aspects in a scalable and effective manner. Specifically, HetTree builds a semantic tree data structure to capture the hierarchy among metapaths. To effectively encode the semantic tree, HetTree uses a novel subtree attention mechanism to emphasize metapaths that are more helpful in encoding parent-child relationships. Moreover, HetTree proposes carefully matching pre-computed features and labels correspondingly, constituting a complete metapath representation. Our evaluation of HetTree on a variety of real-world datasets demonstrates that it outperforms all existing baselines on open benchmarks and efficiently scales to large real-world graphs with millions of nodes and edges.
Paper Structure (16 sections, 6 equations, 5 figures, 5 tables)

This paper contains 16 sections, 6 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: (a) Relational scheme of a heterogeneous email graph (b) An example of the email graph.
  • Figure 2: (a) The offline process of feature aggregation. The center node is the target $Sender$ node and features are aggregated for all metapaths $\mathcal{P}^k$ up to hop $k$, where $k=2$ in this example. (b) The offline process of label aggregation on partially observed labels in the training set. (c) Semantic tree with height of $k$ for Sender nodes in the email graph. A tree node $C_P$ represents metapath $P$, where $P \in \mathcal{P}^k$.
  • Figure 3: Semantic tree aggregation in HetTree.
  • Figure 4: ROC Curves for the email dataset. The error bars for HetTree are tiny.
  • Figure 5: Epoch time and memory usage on HGB datasets.

Theorems & Definitions (4)

  • Definition 3.1
  • Definition 3.2
  • Remark 3.3
  • Definition 3.4