Imbalanced Graph Classification with Multi-scale Oversampling Graph Neural Networks
Rongrong Ma, Guansong Pang, Ling Chen
TL;DR
Imbalanced graph classification is addressed by MOSGNN, a multi-scale oversampling GNN that augments minority graphs at subgraph, graph, and pairwise levels. It jointly optimizes three auxiliary objectives—graph-level classification, pairwise-graph relation prediction, and MIL-based subgraph classification—via the overall objective $\mathcal{L}=L^g + \lambda L^p + \beta L^s$, thereby enriching minority representations. MOSGNN demonstrates significant gains over state-of-the-art methods across 16 datasets and remains adaptable to different loss functions and GNN backbones, highlighting its versatility and practical impact. The approach emphasizes intra- and inter-graph information, offering a scalable, generic framework for robust imbalanced graph learning with strong empirical support.
Abstract
One main challenge in imbalanced graph classification is to learn expressive representations of the graphs in under-represented (minority) classes. Existing generic imbalanced learning methods, such as oversampling and imbalanced learning loss functions, can be adopted for enabling graph representation learning models to cope with this challenge. However, these methods often directly operate on the graph representations, ignoring rich discriminative information within the graphs and their interactions. To tackle this issue, we introduce a novel multi-scale oversampling graph neural network (MOSGNN) that learns expressive minority graph representations based on intra- and inter-graph semantics resulting from oversampled graphs at multiple scales - subgraph, graph, and pairwise graphs. It achieves this by jointly optimizing subgraph-level, graph-level, and pairwise-graph learning tasks to learn the discriminative information embedded within and between the minority graphs. Extensive experiments on 16 imbalanced graph datasets show that MOSGNN i) significantly outperforms five state-of-the-art models, and ii) offers a generic framework, in which different advanced imbalanced learning loss functions can be easily plugged in and obtain significantly improved classification performance.
