Table of Contents
Fetching ...

VIGraph: Generative Self-supervised Learning for Class-Imbalanced Node Classification

Yulan Hu, Sheng Ouyang, Zhirui Yang, Yong Liu

TL;DR

VIGraph, a simple yet effective generative SSL approach that relies on the Variational GAE as the fundamental model, which strictly adheres to the concept of imbalance when constructing imbalanced graphs and innovatively leverages the variational inference (VI) ability of Variational GAE to generate nodes for minority classes.

Abstract

Class imbalance in graph data presents significant challenges for node classification. While existing methods, such as SMOTE-based approaches, partially mitigate this issue, they still exhibit limitations in constructing imbalanced graphs. Generative self-supervised learning (SSL) methods, exemplified by graph autoencoders (GAEs), offer a promising solution by directly generating minority nodes from the data itself, yet their potential remains underexplored. In this paper, we delve into the shortcomings of SMOTE-based approaches in the construction of imbalanced graphs. Furthermore, we introduce VIGraph, a simple yet effective generative SSL approach that relies on the Variational GAE as the fundamental model. VIGraph strictly adheres to the concept of imbalance when constructing imbalanced graphs and innovatively leverages the variational inference (VI) ability of Variational GAE to generate nodes for minority classes. VIGraph introduces comprehensive training strategies, including cross-view contrastive learning at the decoding phase to capture semantic knowledge, adjacency matrix reconstruction to preserve graph structure, and alignment strategy to ensure stable training. VIGraph can generate high-quality nodes directly usable for classification, eliminating the need to integrate the generated nodes back to the graph as well as additional retraining found in SMOTE-based methods. We conduct extensive experiments, results from which demonstrate the superiority and generality of our approach.

VIGraph: Generative Self-supervised Learning for Class-Imbalanced Node Classification

TL;DR

VIGraph, a simple yet effective generative SSL approach that relies on the Variational GAE as the fundamental model, which strictly adheres to the concept of imbalance when constructing imbalanced graphs and innovatively leverages the variational inference (VI) ability of Variational GAE to generate nodes for minority classes.

Abstract

Class imbalance in graph data presents significant challenges for node classification. While existing methods, such as SMOTE-based approaches, partially mitigate this issue, they still exhibit limitations in constructing imbalanced graphs. Generative self-supervised learning (SSL) methods, exemplified by graph autoencoders (GAEs), offer a promising solution by directly generating minority nodes from the data itself, yet their potential remains underexplored. In this paper, we delve into the shortcomings of SMOTE-based approaches in the construction of imbalanced graphs. Furthermore, we introduce VIGraph, a simple yet effective generative SSL approach that relies on the Variational GAE as the fundamental model. VIGraph strictly adheres to the concept of imbalance when constructing imbalanced graphs and innovatively leverages the variational inference (VI) ability of Variational GAE to generate nodes for minority classes. VIGraph introduces comprehensive training strategies, including cross-view contrastive learning at the decoding phase to capture semantic knowledge, adjacency matrix reconstruction to preserve graph structure, and alignment strategy to ensure stable training. VIGraph can generate high-quality nodes directly usable for classification, eliminating the need to integrate the generated nodes back to the graph as well as additional retraining found in SMOTE-based methods. We conduct extensive experiments, results from which demonstrate the superiority and generality of our approach.
Paper Structure (16 sections, 9 equations, 3 figures, 3 tables)

This paper contains 16 sections, 9 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: The explanation of two kinds of imbalance graph construction. Fig \ref{['intro:moti:construct0']} depicts the construction method adopted by the SMOTE-based approaches, wherein specific nodes (node 2 and node 4) are masked while the edges connected to them are retained. Conversely, Fig \ref{['intro:moti:construct1']} illustrates the rigorous construction, where both the nodes and the edges connected to them are removed. Additionally, Fig \ref{['intro:moti:accuracy']} showcases the change in accuracy of GraphSmote observed under these two construction methods on Cora and CiteSeer.
  • Figure 2: The Overview of VIGraph. Initially, the input graph $\mathcal{G}$ is balanced. We manually remove a portion of nodes along with their linked edges to construct an imbalanced graph, denoted as $\bar{\mathcal{G}}$. Subsequently, the variational GNN encoder processes the imbalanced graph as input, performing pairwise encoding to obtain two latent representations, $Z_1$ and $Z_2$. Following this, the GNN decoder reconstructs the imbalanced graph based on $Z_1$ and $Z_2$, resulting in $\tilde{\mathcal{G}}_1$ and $\tilde{\mathcal{G}}_2$, respectively. Additionally, we introduce three strategies to enhance training, which include structure reconstruction between $\bar{\mathcal{G}}$ and $\tilde{\mathcal{G}}_2$ (or $\tilde{\mathcal{G}}_1$), cross-view contrastive learning, and distribution alignment between the latent representation and the posterior distribution.
  • Figure 3: The performance across various imbalance rates.