Table of Contents
Fetching ...

Overcoming Class Imbalance: Unified GNN Learning with Structural and Semantic Connectivity Representations

Abdullah Alchihabi, Hao Yan, Yuhong Guo

TL;DR

This work tackles the problem of class-imbalanced node classification on graphs, where minority classes suffer from scarce labeled data and GNNs exhibit bias toward majority classes. It introduces Uni-GNN, a unified framework that integrates structural and semantic connectivity through two dedicated encoders and a balanced classifier, complemented by a balanced pseudo-label generation mechanism to exploit unlabeled nodes. The approach defines a shortest-path–based structural adjacency $A_{\text{struct}}$ and a semantically informed adjacency $A^{\ell}_{\text{sem}}$ built from fine-grained clusters, enabling diffusion of discriminative information beyond local neighborhoods. Empirical results on Cora, CiteSeer, and PubMed demonstrate that Uni-GNN consistently outperforms baselines, including LTE4G and various graph-imbalance methods, across multiple imbalance ratios and minority-class configurations, with ablations confirming the contributions of each component. The framework offers a principled, scalable solution to under-reaching and neighborhood memorization, improving minority-class generalization while leveraging abundant unlabeled data.

Abstract

Class imbalance is pervasive in real-world graph datasets, where the majority of annotated nodes belong to a small set of classes (majority classes), leaving many other classes (minority classes) with only a handful of labeled nodes. Graph Neural Networks (GNNs) suffer from significant performance degradation in the presence of class imbalance, exhibiting bias towards majority classes and struggling to generalize effectively on minority classes. This limitation stems, in part, from the message passing process, leading GNNs to overfit to the limited neighborhood of annotated nodes from minority classes and impeding the propagation of discriminative information throughout the entire graph. In this paper, we introduce a novel Unified Graph Neural Network Learning (Uni-GNN) framework to tackle class-imbalanced node classification. The proposed framework seamlessly integrates both structural and semantic connectivity representations through semantic and structural node encoders. By combining these connectivity types, Uni-GNN extends the propagation of node embeddings beyond immediate neighbors, encompassing non-adjacent structural nodes and semantically similar nodes, enabling efficient diffusion of discriminative information throughout the graph. Moreover, to harness the potential of unlabeled nodes within the graph, we employ a balanced pseudo-label generation mechanism that augments the pool of available labeled nodes from minority classes in the training set. Experimental results underscore the superior performance of our proposed Uni-GNN framework compared to state-of-the-art class-imbalanced graph learning baselines across multiple benchmark datasets.

Overcoming Class Imbalance: Unified GNN Learning with Structural and Semantic Connectivity Representations

TL;DR

This work tackles the problem of class-imbalanced node classification on graphs, where minority classes suffer from scarce labeled data and GNNs exhibit bias toward majority classes. It introduces Uni-GNN, a unified framework that integrates structural and semantic connectivity through two dedicated encoders and a balanced classifier, complemented by a balanced pseudo-label generation mechanism to exploit unlabeled nodes. The approach defines a shortest-path–based structural adjacency and a semantically informed adjacency built from fine-grained clusters, enabling diffusion of discriminative information beyond local neighborhoods. Empirical results on Cora, CiteSeer, and PubMed demonstrate that Uni-GNN consistently outperforms baselines, including LTE4G and various graph-imbalance methods, across multiple imbalance ratios and minority-class configurations, with ablations confirming the contributions of each component. The framework offers a principled, scalable solution to under-reaching and neighborhood memorization, improving minority-class generalization while leveraging abundant unlabeled data.

Abstract

Class imbalance is pervasive in real-world graph datasets, where the majority of annotated nodes belong to a small set of classes (majority classes), leaving many other classes (minority classes) with only a handful of labeled nodes. Graph Neural Networks (GNNs) suffer from significant performance degradation in the presence of class imbalance, exhibiting bias towards majority classes and struggling to generalize effectively on minority classes. This limitation stems, in part, from the message passing process, leading GNNs to overfit to the limited neighborhood of annotated nodes from minority classes and impeding the propagation of discriminative information throughout the entire graph. In this paper, we introduce a novel Unified Graph Neural Network Learning (Uni-GNN) framework to tackle class-imbalanced node classification. The proposed framework seamlessly integrates both structural and semantic connectivity representations through semantic and structural node encoders. By combining these connectivity types, Uni-GNN extends the propagation of node embeddings beyond immediate neighbors, encompassing non-adjacent structural nodes and semantically similar nodes, enabling efficient diffusion of discriminative information throughout the graph. Moreover, to harness the potential of unlabeled nodes within the graph, we employ a balanced pseudo-label generation mechanism that augments the pool of available labeled nodes from minority classes in the training set. Experimental results underscore the superior performance of our proposed Uni-GNN framework compared to state-of-the-art class-imbalanced graph learning baselines across multiple benchmark datasets.
Paper Structure (20 sections, 14 equations, 2 figures, 5 tables, 1 algorithm)

This paper contains 20 sections, 14 equations, 2 figures, 5 tables, 1 algorithm.

Figures (2)

  • Figure 1: Overview of the proposed Unified GNN Learning framework. The structural ($f_{\text{struct}}$) and semantic ($f_{\text{sem}}$) node encoders leverage their respective connectivity matrices—structural ($A_{\text{struct}}$) and semantic ($\{A^{\ell}_{\text{sem}}\}_{\ell=1}^{L}$). The encoders share concatenated node embeddings—structural ($H^{\ell-1}_{\text{struct}}$) and semantic ($H^{\ell-1}_{\text{sem}}$)—at each message passing layer ($\ell$). The balanced node classifier ($\phi$) utilizes the final unified node embeddings ($H^{L}_{\text{struct}}||H^{L}_{\text{sem}}$) for both node classification and balanced pseudo-label generation.
  • Figure 2: Sensitivity analysis for the proposed framework on hyper-parameters (a) $\alpha$, the max SPD distance in $A_{\text{struct}}$; (b) $K$, the number of clusters; (c) $\epsilon$, the pseudo-label confidence threshold; (d) $\beta$, the rate of updating $\{A^{\ell}_{\text{sem}}\}_{\ell=2}^{L}$.