Graph Size-imbalanced Learning with Energy-guided Structural Smoothing
Jiawen Qin, Pengfeng Huang, Qingyun Sun, Cheng Ji, Xingcheng Fu, Jianxin Li
TL;DR
This work tackles the challenge of size-imbalanced graph classification where head graphs and tail graphs exhibit differing structural distributions that impair GNN performance. It introduces SIMBA, a two-level framework that first learns size-invariant graph embeddings and then builds a graphs-to-graph abstraction to propagate information across similar graphs, followed by an energy-based belief propagation that re-weights graphs to smooth local feature discrepancies. The key contributions are (i) a discriminative, size-invariant graph embedding approach, (ii) a graphs-to-graph propagation mechanism that links graphs across the head-tail spectrum, and (iii) an energy-based re-weighting scheme that emphasizes compatible graphs during training. Empirical results on five real-world size-imbalanced datasets show that SIMBA consistently improves both head and tail graph classification performance and generalizes across different GNN backbones, highlighting its practical impact for robust graph-level learning in imbalanced settings.
Abstract
Graph is a prevalent data structure employed to represent the relationships between entities, frequently serving as a tool to depict and simulate numerous systems, such as molecules and social networks. However, real-world graphs usually suffer from the size-imbalanced problem in the multi-graph classification, i.e., a long-tailed distribution with respect to the number of nodes. Recent studies find that off-the-shelf Graph Neural Networks (GNNs) would compromise model performance under the long-tailed settings. We investigate this phenomenon and discover that the long-tailed graph distribution greatly exacerbates the discrepancies in structural features. To alleviate this problem, we propose a novel energy-based size-imbalanced learning framework named \textbf{SIMBA}, which smooths the features between head and tail graphs and re-weights them based on the energy propagation. Specifically, we construct a higher-level graph abstraction named \textit{Graphs-to-Graph} according to the correlations between graphs to link independent graphs and smooths the structural discrepancies. We further devise an energy-based message-passing belief propagation method for re-weighting lower compatible graphs in the training process and further smooth local feature discrepancies. Extensive experimental results over five public size-imbalanced datasets demonstrate the superior effectiveness of the model for size-imbalanced graph classification tasks.
