Overcoming Class Imbalance: Unified GNN Learning with Structural and Semantic Connectivity Representations
Abdullah Alchihabi, Hao Yan, Yuhong Guo
TL;DR
This work tackles the problem of class-imbalanced node classification on graphs, where minority classes suffer from scarce labeled data and GNNs exhibit bias toward majority classes. It introduces Uni-GNN, a unified framework that integrates structural and semantic connectivity through two dedicated encoders and a balanced classifier, complemented by a balanced pseudo-label generation mechanism to exploit unlabeled nodes. The approach defines a shortest-path–based structural adjacency $A_{\text{struct}}$ and a semantically informed adjacency $A^{\ell}_{\text{sem}}$ built from fine-grained clusters, enabling diffusion of discriminative information beyond local neighborhoods. Empirical results on Cora, CiteSeer, and PubMed demonstrate that Uni-GNN consistently outperforms baselines, including LTE4G and various graph-imbalance methods, across multiple imbalance ratios and minority-class configurations, with ablations confirming the contributions of each component. The framework offers a principled, scalable solution to under-reaching and neighborhood memorization, improving minority-class generalization while leveraging abundant unlabeled data.
Abstract
Class imbalance is pervasive in real-world graph datasets, where the majority of annotated nodes belong to a small set of classes (majority classes), leaving many other classes (minority classes) with only a handful of labeled nodes. Graph Neural Networks (GNNs) suffer from significant performance degradation in the presence of class imbalance, exhibiting bias towards majority classes and struggling to generalize effectively on minority classes. This limitation stems, in part, from the message passing process, leading GNNs to overfit to the limited neighborhood of annotated nodes from minority classes and impeding the propagation of discriminative information throughout the entire graph. In this paper, we introduce a novel Unified Graph Neural Network Learning (Uni-GNN) framework to tackle class-imbalanced node classification. The proposed framework seamlessly integrates both structural and semantic connectivity representations through semantic and structural node encoders. By combining these connectivity types, Uni-GNN extends the propagation of node embeddings beyond immediate neighbors, encompassing non-adjacent structural nodes and semantically similar nodes, enabling efficient diffusion of discriminative information throughout the graph. Moreover, to harness the potential of unlabeled nodes within the graph, we employ a balanced pseudo-label generation mechanism that augments the pool of available labeled nodes from minority classes in the training set. Experimental results underscore the superior performance of our proposed Uni-GNN framework compared to state-of-the-art class-imbalanced graph learning baselines across multiple benchmark datasets.
