Geometric Imbalance in Semi-Supervised Node Classification

Liang Yan; Shengzhong Zhang; Bisheng Li; Menglin Yang; Chen Yang; Min Zhou; Weiyang Ding; Yutong Xie; Zengfeng Huang

Geometric Imbalance in Semi-Supervised Node Classification

Liang Yan, Shengzhong Zhang, Bisheng Li, Menglin Yang, Chen Yang, Min Zhou, Weiyang Ding, Yutong Xie, Zengfeng Huang

TL;DR

This work identifies geometric imbalance (GI) as an angular ambiguity in unit-sphere embeddings produced by graph neural networks under class imbalance, formalizing its link to prediction uncertainty via a von Mises–Fisher model on the hypersphere. It offers a unified, modular framework called UNREAL to mitigate GI through three components: DPAM for aligning pseudo-labels from clustering and prediction, Node-Reordering to fuse geometry and confidence while gradually shifting reliance from geometry to confidence, and DGIS to discard geometrically ambiguous samples. Theoretical results connect GI to entropy and imbalance ratio, while extensive experiments on nine benchmarks (including large-scale datasets) show consistent gains over state-of-the-art baselines, particularly as imbalance intensifies. The approach advances both theory and practice for robust semi-supervised node classification on imbalanced graphs, with potential extensions to other graph tasks.

Abstract

Class imbalance in graph data presents a significant challenge for effective node classification, particularly in semi-supervised scenarios. In this work, we formally introduce the concept of geometric imbalance, which captures how message passing on class-imbalanced graphs leads to geometric ambiguity among minority-class nodes in the riemannian manifold embedding space. We provide a rigorous theoretical analysis of geometric imbalance on the riemannian manifold and propose a unified framework that explicitly mitigates it through pseudo-label alignment, node reordering, and ambiguity filtering. Extensive experiments on diverse benchmarks show that our approach consistently outperforms existing methods, especially under severe class imbalance. Our findings offer new theoretical insights and practical tools for robust semi-supervised node classification.

Geometric Imbalance in Semi-Supervised Node Classification

TL;DR

Abstract

Paper Structure (65 sections, 3 theorems, 31 equations, 12 figures, 36 tables, 1 algorithm)

This paper contains 65 sections, 3 theorems, 31 equations, 12 figures, 36 tables, 1 algorithm.

Introduction
Preliminaries
Geometric Imbalance
Method
Mitigating Geometric Imbalance with Pseudo-label Alignment
Node-Reordering
Mitigating Learning Bias by Discarding Geometric Imbalanced Samples
Discarding Geometric Imbalanced Samples (DGIS).
Experiment
Experimental Setups
Experimental Results and Analysis
Related Work
Conclusion and Future Work
Appendix
Notations
...and 50 more sections

Key Result

Theorem 1

Let $\mathcal{D}^{\text{unlabel}} \subset \mathcal{V}$ be the set of unlabeled nodes, $V_{\text{minor}} := \{ v \in \mathcal{V} \mid y_v \in \mathcal{C}_{\text{minor}} \}$ denote the set of all nodes whose ground-truth labels belong to the minority class set $\mathcal{C}_{\text{minor}} \subset \{1, Then the average information entropy $\hat{H}$ of pseudo-label predictions over $\mathcal{D}^{\text

Figures (12)

Figure 1: Illustration and quantitative analysis of Geometric Imbalance (GI). (a) Conceptual illustration of GI under different pretrained and fine-tuned settings. (b)-(e): t-SNE visualizations of node embeddings in the Cora dataset under four representative cases, showing intra-class compactness and inter-class separation patterns. (f)-(g): Quantitative relationships between GI and (f) average entropy, and (g) class imbalance ratio.
Figure 2: The pipeline of our UNREAL framework.
Figure 3: Fluctuation of RBO values between rankings as iterations progress.
Figure 4: Sensitivity analysis.
Figure 5: Illustration of geometric imbalance across different GNN encoder cases.
...and 7 more figures

Theorems & Definitions (9)

Definition 1: $\epsilon$-Geometric Imbalance on the Riemannian Manifold
Theorem 1: Geometric Imbalance vs. Information Entropy
Theorem 2: Imbalance Ratio vs. Geometric Imbalance
Definition 2: Confidence Rankings (CR)
Definition 3: Geometric Rankings (GR)
Theorem 3
proof : Proof of Theorem \ref{['theorem:entropy_imbalance']} (Message Passing Perspective)
proof : Proof of Theorem \ref{['theorem:imbalance_vs_geometric']}
proof : Proof of Theorem \ref{['theorem_3']}

Geometric Imbalance in Semi-Supervised Node Classification

TL;DR

Abstract

Geometric Imbalance in Semi-Supervised Node Classification

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (9)