Edge Classification on Graphs: New Directions in Topological Imbalance
Xueqi Cheng, Yu Wang, Yunchao Liu, Yuying Zhao, Charu C. Aggarwal, Tyler Derr
TL;DR
The paper addresses the underexplored problem of topological imbalance in edge classification on graphs. It introduces Topological Entropy (TE) as a metric to quantify local class distribution variance around each edge, and develops a two-pronged solution: Topological Reweight (R_t) and TE Wedge-based Mixup (X_tw), forming the TopoEdge framework. TE reweight emphasizes high-TE edges while TE wedge-based mixup generates synthetic edges within high-TE wedges to improve generalization; the methods are integrated into an end-to-end training scheme. Experiments on six real-world imbalanced datasets across multiple GNN backbones show consistent gains in balanced accuracy and Macro-F1, establishing a new benchmark for imbalanced edge classification. The work offers a principled topology-centric view for edge labeling and suggests future directions for topology-aware learning on graphs.
Abstract
Recent years have witnessed the remarkable success of applying Graph machine learning (GML) to node/graph classification and link prediction. However, edge classification task that enjoys numerous real-world applications such as social network analysis and cybersecurity, has not seen significant advancement. To address this gap, our study pioneers a comprehensive approach to edge classification. We identify a novel `Topological Imbalance Issue', which arises from the skewed distribution of edges across different classes, affecting the local subgraph of each edge and harming the performance of edge classifications. Inspired by the recent studies in node classification that the performance discrepancy exists with varying local structural patterns, we aim to investigate if the performance discrepancy in topological imbalanced edge classification can also be mitigated by characterizing the local class distribution variance. To overcome this challenge, we introduce Topological Entropy (TE), a novel topological-based metric that measures the topological imbalance for each edge. Our empirical studies confirm that TE effectively measures local class distribution variance, and indicate that prioritizing edges with high TE values can help address the issue of topological imbalance. Based on this, we develop two strategies - Topological Reweighting and TE Wedge-based Mixup - to focus training on (synthetic) edges based on their TEs. While topological reweighting directly manipulates training edge weights according to TE, our wedge-based mixup interpolates synthetic edges between high TE wedges. Ultimately, we integrate these strategies into a novel topological imbalance strategy for edge classification: TopoEdge. Through extensive experiments, we demonstrate the efficacy of our proposed strategies on newly curated datasets and thus establish a new benchmark for (imbalanced) edge classification.
