Table of Contents
Fetching ...

Graph Size-imbalanced Learning with Energy-guided Structural Smoothing

Jiawen Qin, Pengfeng Huang, Qingyun Sun, Cheng Ji, Xingcheng Fu, Jianxin Li

TL;DR

This work tackles the challenge of size-imbalanced graph classification where head graphs and tail graphs exhibit differing structural distributions that impair GNN performance. It introduces SIMBA, a two-level framework that first learns size-invariant graph embeddings and then builds a graphs-to-graph abstraction to propagate information across similar graphs, followed by an energy-based belief propagation that re-weights graphs to smooth local feature discrepancies. The key contributions are (i) a discriminative, size-invariant graph embedding approach, (ii) a graphs-to-graph propagation mechanism that links graphs across the head-tail spectrum, and (iii) an energy-based re-weighting scheme that emphasizes compatible graphs during training. Empirical results on five real-world size-imbalanced datasets show that SIMBA consistently improves both head and tail graph classification performance and generalizes across different GNN backbones, highlighting its practical impact for robust graph-level learning in imbalanced settings.

Abstract

Graph is a prevalent data structure employed to represent the relationships between entities, frequently serving as a tool to depict and simulate numerous systems, such as molecules and social networks. However, real-world graphs usually suffer from the size-imbalanced problem in the multi-graph classification, i.e., a long-tailed distribution with respect to the number of nodes. Recent studies find that off-the-shelf Graph Neural Networks (GNNs) would compromise model performance under the long-tailed settings. We investigate this phenomenon and discover that the long-tailed graph distribution greatly exacerbates the discrepancies in structural features. To alleviate this problem, we propose a novel energy-based size-imbalanced learning framework named \textbf{SIMBA}, which smooths the features between head and tail graphs and re-weights them based on the energy propagation. Specifically, we construct a higher-level graph abstraction named \textit{Graphs-to-Graph} according to the correlations between graphs to link independent graphs and smooths the structural discrepancies. We further devise an energy-based message-passing belief propagation method for re-weighting lower compatible graphs in the training process and further smooth local feature discrepancies. Extensive experimental results over five public size-imbalanced datasets demonstrate the superior effectiveness of the model for size-imbalanced graph classification tasks.

Graph Size-imbalanced Learning with Energy-guided Structural Smoothing

TL;DR

This work tackles the challenge of size-imbalanced graph classification where head graphs and tail graphs exhibit differing structural distributions that impair GNN performance. It introduces SIMBA, a two-level framework that first learns size-invariant graph embeddings and then builds a graphs-to-graph abstraction to propagate information across similar graphs, followed by an energy-based belief propagation that re-weights graphs to smooth local feature discrepancies. The key contributions are (i) a discriminative, size-invariant graph embedding approach, (ii) a graphs-to-graph propagation mechanism that links graphs across the head-tail spectrum, and (iii) an energy-based re-weighting scheme that emphasizes compatible graphs during training. Empirical results on five real-world size-imbalanced datasets show that SIMBA consistently improves both head and tail graph classification performance and generalizes across different GNN backbones, highlighting its practical impact for robust graph-level learning in imbalanced settings.

Abstract

Graph is a prevalent data structure employed to represent the relationships between entities, frequently serving as a tool to depict and simulate numerous systems, such as molecules and social networks. However, real-world graphs usually suffer from the size-imbalanced problem in the multi-graph classification, i.e., a long-tailed distribution with respect to the number of nodes. Recent studies find that off-the-shelf Graph Neural Networks (GNNs) would compromise model performance under the long-tailed settings. We investigate this phenomenon and discover that the long-tailed graph distribution greatly exacerbates the discrepancies in structural features. To alleviate this problem, we propose a novel energy-based size-imbalanced learning framework named \textbf{SIMBA}, which smooths the features between head and tail graphs and re-weights them based on the energy propagation. Specifically, we construct a higher-level graph abstraction named \textit{Graphs-to-Graph} according to the correlations between graphs to link independent graphs and smooths the structural discrepancies. We further devise an energy-based message-passing belief propagation method for re-weighting lower compatible graphs in the training process and further smooth local feature discrepancies. Extensive experimental results over five public size-imbalanced datasets demonstrate the superior effectiveness of the model for size-imbalanced graph classification tasks.

Paper Structure

This paper contains 24 sections, 12 equations, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 1: (a) The graph sizes in three benchmark datasets follow the power-law-like distribution. (b) We apply GIN for acquiring structural features of graphs and use the Central Moment Discrepancy (CMD) zellinger2019robust to evaluate the structural discrepancies under the long-tail and relative size-balanced graph distributions. A higher CMD score indicates a larger variation in feature distribution.
  • Figure 2: An illustration of SIMBA architecture. (1) SIMBA derives the graph's representations through a discriminative graph encoder. (2) SIMBA establishes a higher-level graph abstraction (graphs-to-graph) to acquire extra supervision structural information from the top-$k$ nearest graphs in the latent space. (3) SIMBA propagates energy belief of each instance along the edges over constructed graphs-to-graph and re-weight it according to the ordered energy score list of all instances.
  • Figure 3: Performance analysis of SIMBA.