Table of Contents
Fetching ...

SynHING: Synthetic Heterogeneous Information Network Generation for Graph Learning and Explanation

Ming-Yi Hong, Yi-Hsiang Huang, Shao-En Lin, You-Chen Teng, Chih-Yu Wang, Che Lin

TL;DR

SynHING tackles the scarcity of diverse, ground-truth explanations for heterogeneous information networks by providing a motif-driven, bottom-up synthetic HIN generation framework. It introduces major motif generation, base subgraph construction, intra-/inter-cluster merges, node feature generation, and post-pruning to produce scalable graphs that preserve the reference graph's properties while exposing ground-truth explanations. The framework enables ground-truth explanations for HGNNs and supports pretraining transfer experiments, with controllable cluster exclusion via intra-/inter-cluster probabilities and SNR-driven features. Experimental results on IMDB, ACM, and DBLP show meaningful positive transfer from synthetic to real graphs and highlight the role of motifs in explainability and learning, offering a practical tool for robust evaluation and pretraining in heterogeneous graph learning.

Abstract

Graph Neural Networks (GNNs) excel in delineating graph structures in diverse domains, including community analysis and recommendation systems. As the interpretation of GNNs becomes increasingly important, the demand for robust baselines and expansive graph datasets is accentuated, particularly in the context of Heterogeneous Information Networks (HIN). Addressing this, we introduce SynHING, a novel framework for Synthetic Heterogeneous Information Network Generation aimed at enhancing graph learning and explanation. SynHING systematically identifies major motifs in a target HIN and employs a bottom-up generation process with intra-cluster and inter-cluster merge modules. This process, supplemented by post-pruning techniques, ensures the synthetic HIN closely mirrors the original graph's structural and statistical properties. Crucially, SynHING provides ground-truth motifs for evaluating GNN explainer models, setting a new standard for explainable, synthetic HIN generation and contributing to the advancement of interpretable machine learning in complex networks.

SynHING: Synthetic Heterogeneous Information Network Generation for Graph Learning and Explanation

TL;DR

SynHING tackles the scarcity of diverse, ground-truth explanations for heterogeneous information networks by providing a motif-driven, bottom-up synthetic HIN generation framework. It introduces major motif generation, base subgraph construction, intra-/inter-cluster merges, node feature generation, and post-pruning to produce scalable graphs that preserve the reference graph's properties while exposing ground-truth explanations. The framework enables ground-truth explanations for HGNNs and supports pretraining transfer experiments, with controllable cluster exclusion via intra-/inter-cluster probabilities and SNR-driven features. Experimental results on IMDB, ACM, and DBLP show meaningful positive transfer from synthetic to real graphs and highlight the role of motifs in explainability and learning, offering a practical tool for robust evaluation and pretraining in heterogeneous graph learning.

Abstract

Graph Neural Networks (GNNs) excel in delineating graph structures in diverse domains, including community analysis and recommendation systems. As the interpretation of GNNs becomes increasingly important, the demand for robust baselines and expansive graph datasets is accentuated, particularly in the context of Heterogeneous Information Networks (HIN). Addressing this, we introduce SynHING, a novel framework for Synthetic Heterogeneous Information Network Generation aimed at enhancing graph learning and explanation. SynHING systematically identifies major motifs in a target HIN and employs a bottom-up generation process with intra-cluster and inter-cluster merge modules. This process, supplemented by post-pruning techniques, ensures the synthetic HIN closely mirrors the original graph's structural and statistical properties. Crucially, SynHING provides ground-truth motifs for evaluating GNN explainer models, setting a new standard for explainable, synthetic HIN generation and contributing to the advancement of interpretable machine learning in complex networks.
Paper Structure (33 sections, 8 equations, 12 figures, 4 tables)

This paper contains 33 sections, 8 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Synthetic HIN Generation Flow
  • Figure 2: SynHING
  • Figure 3: Intra-Cluster and Inter-Cluster Merges
  • Figure 4: Graph Schema and Major Motifs of the Three Heterogeneous Graph Datasets
  • Figure 5: Visualization of Synthetic IMDB in Different Intra-/Inter-Cluster Probabilities. Dark blue represents minor nodes, and other colors indicate target nodes on different labels.
  • ...and 7 more figures