SynHING: Synthetic Heterogeneous Information Network Generation for Graph Learning and Explanation

Ming-Yi Hong; Yi-Hsiang Huang; Shao-En Lin; You-Chen Teng; Chih-Yu Wang; Che Lin

SynHING: Synthetic Heterogeneous Information Network Generation for Graph Learning and Explanation

Ming-Yi Hong, Yi-Hsiang Huang, Shao-En Lin, You-Chen Teng, Chih-Yu Wang, Che Lin

TL;DR

SynHING tackles the scarcity of diverse, ground-truth explanations for heterogeneous information networks by providing a motif-driven, bottom-up synthetic HIN generation framework. It introduces major motif generation, base subgraph construction, intra-/inter-cluster merges, node feature generation, and post-pruning to produce scalable graphs that preserve the reference graph's properties while exposing ground-truth explanations. The framework enables ground-truth explanations for HGNNs and supports pretraining transfer experiments, with controllable cluster exclusion via intra-/inter-cluster probabilities and SNR-driven features. Experimental results on IMDB, ACM, and DBLP show meaningful positive transfer from synthetic to real graphs and highlight the role of motifs in explainability and learning, offering a practical tool for robust evaluation and pretraining in heterogeneous graph learning.

Abstract

Graph Neural Networks (GNNs) excel in delineating graph structures in diverse domains, including community analysis and recommendation systems. As the interpretation of GNNs becomes increasingly important, the demand for robust baselines and expansive graph datasets is accentuated, particularly in the context of Heterogeneous Information Networks (HIN). Addressing this, we introduce SynHING, a novel framework for Synthetic Heterogeneous Information Network Generation aimed at enhancing graph learning and explanation. SynHING systematically identifies major motifs in a target HIN and employs a bottom-up generation process with intra-cluster and inter-cluster merge modules. This process, supplemented by post-pruning techniques, ensures the synthetic HIN closely mirrors the original graph's structural and statistical properties. Crucially, SynHING provides ground-truth motifs for evaluating GNN explainer models, setting a new standard for explainable, synthetic HIN generation and contributing to the advancement of interpretable machine learning in complex networks.

SynHING: Synthetic Heterogeneous Information Network Generation for Graph Learning and Explanation

TL;DR

Abstract

Paper Structure (33 sections, 8 equations, 12 figures, 4 tables)

This paper contains 33 sections, 8 equations, 12 figures, 4 tables.

Introduction
Related Work
Synthetic Graph Generation
Explainer for Graph Neural Networks
Datasets with Ground-Truth Explanations
Proposed Method: SynHING
Preliminaries
Overview of SynHING
Major Motif Generation (MMG)
Base Subgraph Generation (BSG)
Merge to Generate HINs
Intra-Cluster Merge (Intra-CM)
Inter-Cluster Merge (Inter-CM)
Node Feature Generation (NFG)
Post-Pruning (P-P)
...and 18 more sections

Figures (12)

Figure 1: Synthetic HIN Generation Flow
Figure 2: SynHING
Figure 3: Intra-Cluster and Inter-Cluster Merges
Figure 4: Graph Schema and Major Motifs of the Three Heterogeneous Graph Datasets
Figure 5: Visualization of Synthetic IMDB in Different Intra-/Inter-Cluster Probabilities. Dark blue represents minor nodes, and other colors indicate target nodes on different labels.
...and 7 more figures

SynHING: Synthetic Heterogeneous Information Network Generation for Graph Learning and Explanation

TL;DR

Abstract

SynHING: Synthetic Heterogeneous Information Network Generation for Graph Learning and Explanation

Authors

TL;DR

Abstract

Table of Contents

Figures (12)