Table of Contents
Fetching ...

A GAN Approach for Node Embedding in Heterogeneous Graphs Using Subgraph Sampling

Hung-Chun Hsu, Bo-Jun Wu, Ming-Yi Hong, Che Lin, Chih-Yu Wang

TL;DR

Evaluations on multiple real-world datasets demonstrate the method's superiority over baseline models, particularly in tasks focused on identifying minority node classes, with notable improvements in performance metrics such as F-score and AUC-PRC score.

Abstract

Graph neural networks (GNNs) face significant challenges with class imbalance, leading to biased inference results. To address this issue in heterogeneous graphs, we propose a novel framework that combines Graph Neural Network (GNN) and Generative Adversarial Network (GAN) to enhance classification for underrepresented node classes. The framework incorporates an advanced edge generation and selection module, enabling the simultaneous creation of synthetic nodes and edges through adversarial learning. Unlike previous methods, which predominantly focus on homogeneous graphs due to the difficulty of representing heterogeneous graph structures in matrix form, this approach is specifically designed for heterogeneous data. Existing solutions often rely on pre-trained models to incorporate synthetic nodes, which can lead to optimization inconsistencies and mismatches in data representation. Our framework avoids these pitfalls by generating data that aligns closely with the inherent graph topology and attributes, ensuring a more cohesive integration. Evaluations on multiple real-world datasets demonstrate the method's superiority over baseline models, particularly in tasks focused on identifying minority node classes, with notable improvements in performance metrics such as F-score and AUC-PRC score. These findings highlight the potential of this approach for addressing critical challenges in the field.

A GAN Approach for Node Embedding in Heterogeneous Graphs Using Subgraph Sampling

TL;DR

Evaluations on multiple real-world datasets demonstrate the method's superiority over baseline models, particularly in tasks focused on identifying minority node classes, with notable improvements in performance metrics such as F-score and AUC-PRC score.

Abstract

Graph neural networks (GNNs) face significant challenges with class imbalance, leading to biased inference results. To address this issue in heterogeneous graphs, we propose a novel framework that combines Graph Neural Network (GNN) and Generative Adversarial Network (GAN) to enhance classification for underrepresented node classes. The framework incorporates an advanced edge generation and selection module, enabling the simultaneous creation of synthetic nodes and edges through adversarial learning. Unlike previous methods, which predominantly focus on homogeneous graphs due to the difficulty of representing heterogeneous graph structures in matrix form, this approach is specifically designed for heterogeneous data. Existing solutions often rely on pre-trained models to incorporate synthetic nodes, which can lead to optimization inconsistencies and mismatches in data representation. Our framework avoids these pitfalls by generating data that aligns closely with the inherent graph topology and attributes, ensuring a more cohesive integration. Evaluations on multiple real-world datasets demonstrate the method's superiority over baseline models, particularly in tasks focused on identifying minority node classes, with notable improvements in performance metrics such as F-score and AUC-PRC score. These findings highlight the potential of this approach for addressing critical challenges in the field.
Paper Structure (37 sections, 15 equations, 3 figures, 7 tables, 2 algorithms)

This paper contains 37 sections, 15 equations, 3 figures, 7 tables, 2 algorithms.

Figures (3)

  • Figure 1: FlashGAN Training Workflow: FlashGAN processes subgraphs in batches. The workflow for a single subgraph is as follows: (1) Extract subgraph $SG$ from the original graph $G$. (2) Input subgraph embedding $\textbf{g}$ into the synthetic node generator $\Omega$. (3) Connect generated synthetic nodes $\Pi = \{ \pi_1, \pi_2 \}$ to all nodes in the sampled subgraph $SG$, forming the augmented subgraph $SG_{\text{aug}}$. (4) Pass real edges $F_{\text{real}}$ through a synthetic edge filter $\Delta_r$, obtain the edge threshold $\eta_r^{*}$, and split synthetic edges $F_{\text{potential}}$ into retained and discarded groups, $F_{\text{retained}}$ and $F_{\text{discarded}}$. (5) Obtain filtered augmented subgraph $SG_{\Delta}$ with selected synthetic nodes and retained edges. (6) Update the generator based on the discriminator’s classification of retained and discarded edges.
  • Figure 2: Influence of Imbalance Ratio
  • Figure 3: Investigation of the Synthetic Edge Filter