Table of Contents
Fetching ...

Geometric Imbalance in Semi-Supervised Node Classification

Liang Yan, Shengzhong Zhang, Bisheng Li, Menglin Yang, Chen Yang, Min Zhou, Weiyang Ding, Yutong Xie, Zengfeng Huang

TL;DR

This work identifies geometric imbalance (GI) as an angular ambiguity in unit-sphere embeddings produced by graph neural networks under class imbalance, formalizing its link to prediction uncertainty via a von Mises–Fisher model on the hypersphere. It offers a unified, modular framework called UNREAL to mitigate GI through three components: DPAM for aligning pseudo-labels from clustering and prediction, Node-Reordering to fuse geometry and confidence while gradually shifting reliance from geometry to confidence, and DGIS to discard geometrically ambiguous samples. Theoretical results connect GI to entropy and imbalance ratio, while extensive experiments on nine benchmarks (including large-scale datasets) show consistent gains over state-of-the-art baselines, particularly as imbalance intensifies. The approach advances both theory and practice for robust semi-supervised node classification on imbalanced graphs, with potential extensions to other graph tasks.

Abstract

Class imbalance in graph data presents a significant challenge for effective node classification, particularly in semi-supervised scenarios. In this work, we formally introduce the concept of geometric imbalance, which captures how message passing on class-imbalanced graphs leads to geometric ambiguity among minority-class nodes in the riemannian manifold embedding space. We provide a rigorous theoretical analysis of geometric imbalance on the riemannian manifold and propose a unified framework that explicitly mitigates it through pseudo-label alignment, node reordering, and ambiguity filtering. Extensive experiments on diverse benchmarks show that our approach consistently outperforms existing methods, especially under severe class imbalance. Our findings offer new theoretical insights and practical tools for robust semi-supervised node classification.

Geometric Imbalance in Semi-Supervised Node Classification

TL;DR

This work identifies geometric imbalance (GI) as an angular ambiguity in unit-sphere embeddings produced by graph neural networks under class imbalance, formalizing its link to prediction uncertainty via a von Mises–Fisher model on the hypersphere. It offers a unified, modular framework called UNREAL to mitigate GI through three components: DPAM for aligning pseudo-labels from clustering and prediction, Node-Reordering to fuse geometry and confidence while gradually shifting reliance from geometry to confidence, and DGIS to discard geometrically ambiguous samples. Theoretical results connect GI to entropy and imbalance ratio, while extensive experiments on nine benchmarks (including large-scale datasets) show consistent gains over state-of-the-art baselines, particularly as imbalance intensifies. The approach advances both theory and practice for robust semi-supervised node classification on imbalanced graphs, with potential extensions to other graph tasks.

Abstract

Class imbalance in graph data presents a significant challenge for effective node classification, particularly in semi-supervised scenarios. In this work, we formally introduce the concept of geometric imbalance, which captures how message passing on class-imbalanced graphs leads to geometric ambiguity among minority-class nodes in the riemannian manifold embedding space. We provide a rigorous theoretical analysis of geometric imbalance on the riemannian manifold and propose a unified framework that explicitly mitigates it through pseudo-label alignment, node reordering, and ambiguity filtering. Extensive experiments on diverse benchmarks show that our approach consistently outperforms existing methods, especially under severe class imbalance. Our findings offer new theoretical insights and practical tools for robust semi-supervised node classification.
Paper Structure (65 sections, 3 theorems, 31 equations, 12 figures, 36 tables, 1 algorithm)

This paper contains 65 sections, 3 theorems, 31 equations, 12 figures, 36 tables, 1 algorithm.

Key Result

Theorem 1

Let $\mathcal{D}^{\text{unlabel}} \subset \mathcal{V}$ be the set of unlabeled nodes, $V_{\text{minor}} := \{ v \in \mathcal{V} \mid y_v \in \mathcal{C}_{\text{minor}} \}$ denote the set of all nodes whose ground-truth labels belong to the minority class set $\mathcal{C}_{\text{minor}} \subset \{1, Then the average information entropy $\hat{H}$ of pseudo-label predictions over $\mathcal{D}^{\text

Figures (12)

  • Figure 1: Illustration and quantitative analysis of Geometric Imbalance (GI). (a) Conceptual illustration of GI under different pretrained and fine-tuned settings. (b)-(e): t-SNE visualizations of node embeddings in the Cora dataset under four representative cases, showing intra-class compactness and inter-class separation patterns. (f)-(g): Quantitative relationships between GI and (f) average entropy, and (g) class imbalance ratio.
  • Figure 2: The pipeline of our UNREAL framework.
  • Figure 3: Fluctuation of RBO values between rankings as iterations progress.
  • Figure 4: Sensitivity analysis.
  • Figure 5: Illustration of geometric imbalance across different GNN encoder cases.
  • ...and 7 more figures

Theorems & Definitions (9)

  • Definition 1: $\epsilon$-Geometric Imbalance on the Riemannian Manifold
  • Theorem 1: Geometric Imbalance vs. Information Entropy
  • Theorem 2: Imbalance Ratio vs. Geometric Imbalance
  • Definition 2: Confidence Rankings (CR)
  • Definition 3: Geometric Rankings (GR)
  • Theorem 3
  • proof : Proof of Theorem \ref{['theorem:entropy_imbalance']} (Message Passing Perspective)
  • proof : Proof of Theorem \ref{['theorem:imbalance_vs_geometric']}
  • proof : Proof of Theorem \ref{['theorem_3']}