Table of Contents
Fetching ...

Edge Classification on Graphs: New Directions in Topological Imbalance

Xueqi Cheng, Yu Wang, Yunchao Liu, Yuying Zhao, Charu C. Aggarwal, Tyler Derr

TL;DR

The paper addresses the underexplored problem of topological imbalance in edge classification on graphs. It introduces Topological Entropy (TE) as a metric to quantify local class distribution variance around each edge, and develops a two-pronged solution: Topological Reweight (R_t) and TE Wedge-based Mixup (X_tw), forming the TopoEdge framework. TE reweight emphasizes high-TE edges while TE wedge-based mixup generates synthetic edges within high-TE wedges to improve generalization; the methods are integrated into an end-to-end training scheme. Experiments on six real-world imbalanced datasets across multiple GNN backbones show consistent gains in balanced accuracy and Macro-F1, establishing a new benchmark for imbalanced edge classification. The work offers a principled topology-centric view for edge labeling and suggests future directions for topology-aware learning on graphs.

Abstract

Recent years have witnessed the remarkable success of applying Graph machine learning (GML) to node/graph classification and link prediction. However, edge classification task that enjoys numerous real-world applications such as social network analysis and cybersecurity, has not seen significant advancement. To address this gap, our study pioneers a comprehensive approach to edge classification. We identify a novel `Topological Imbalance Issue', which arises from the skewed distribution of edges across different classes, affecting the local subgraph of each edge and harming the performance of edge classifications. Inspired by the recent studies in node classification that the performance discrepancy exists with varying local structural patterns, we aim to investigate if the performance discrepancy in topological imbalanced edge classification can also be mitigated by characterizing the local class distribution variance. To overcome this challenge, we introduce Topological Entropy (TE), a novel topological-based metric that measures the topological imbalance for each edge. Our empirical studies confirm that TE effectively measures local class distribution variance, and indicate that prioritizing edges with high TE values can help address the issue of topological imbalance. Based on this, we develop two strategies - Topological Reweighting and TE Wedge-based Mixup - to focus training on (synthetic) edges based on their TEs. While topological reweighting directly manipulates training edge weights according to TE, our wedge-based mixup interpolates synthetic edges between high TE wedges. Ultimately, we integrate these strategies into a novel topological imbalance strategy for edge classification: TopoEdge. Through extensive experiments, we demonstrate the efficacy of our proposed strategies on newly curated datasets and thus establish a new benchmark for (imbalanced) edge classification.

Edge Classification on Graphs: New Directions in Topological Imbalance

TL;DR

The paper addresses the underexplored problem of topological imbalance in edge classification on graphs. It introduces Topological Entropy (TE) as a metric to quantify local class distribution variance around each edge, and develops a two-pronged solution: Topological Reweight (R_t) and TE Wedge-based Mixup (X_tw), forming the TopoEdge framework. TE reweight emphasizes high-TE edges while TE wedge-based mixup generates synthetic edges within high-TE wedges to improve generalization; the methods are integrated into an end-to-end training scheme. Experiments on six real-world imbalanced datasets across multiple GNN backbones show consistent gains in balanced accuracy and Macro-F1, establishing a new benchmark for imbalanced edge classification. The work offers a principled topology-centric view for edge labeling and suggests future directions for topology-aware learning on graphs.

Abstract

Recent years have witnessed the remarkable success of applying Graph machine learning (GML) to node/graph classification and link prediction. However, edge classification task that enjoys numerous real-world applications such as social network analysis and cybersecurity, has not seen significant advancement. To address this gap, our study pioneers a comprehensive approach to edge classification. We identify a novel `Topological Imbalance Issue', which arises from the skewed distribution of edges across different classes, affecting the local subgraph of each edge and harming the performance of edge classifications. Inspired by the recent studies in node classification that the performance discrepancy exists with varying local structural patterns, we aim to investigate if the performance discrepancy in topological imbalanced edge classification can also be mitigated by characterizing the local class distribution variance. To overcome this challenge, we introduce Topological Entropy (TE), a novel topological-based metric that measures the topological imbalance for each edge. Our empirical studies confirm that TE effectively measures local class distribution variance, and indicate that prioritizing edges with high TE values can help address the issue of topological imbalance. Based on this, we develop two strategies - Topological Reweighting and TE Wedge-based Mixup - to focus training on (synthetic) edges based on their TEs. While topological reweighting directly manipulates training edge weights according to TE, our wedge-based mixup interpolates synthetic edges between high TE wedges. Ultimately, we integrate these strategies into a novel topological imbalance strategy for edge classification: TopoEdge. Through extensive experiments, we demonstrate the efficacy of our proposed strategies on newly curated datasets and thus establish a new benchmark for (imbalanced) edge classification.
Paper Structure (20 sections, 15 equations, 8 figures, 4 tables, 1 algorithm)

This paper contains 20 sections, 15 equations, 8 figures, 4 tables, 1 algorithm.

Figures (8)

  • Figure 1: After equipping GNNs with off-the-shelf quantity reweight ($R_q$), GCN achieves worse overall Macro-F1 shown in (a) and similarly observed with GAT in (b).
  • Figure 2: We visualize the validation F1 score for majority and minority class of GCN and GCN+$R_q$ across different edge categories on Epinions dataset. Specifically, edge categories are defined by the two node endpoints being categorized as mostly majority (M), mostly minority (m), or uncertain (U), and we visualize the distribution of edges within the dataset.
  • Figure 3: We visualize the empirical discrepancy (in terms of training accuracy) between majority and minority edges when grouped by their TE (along with the number of edges in each group) and using GCN on the Epinions dataset.
  • Figure 4: After applying $R_{tq}$ which strategically emphasizes edges with high local distribution variance on top of $R_q$, significant performance improvement can be observed for majority class (a) and minority class (b), with the most significant improvement can be observed for edges in relatively large class distribution variance categories (i.e., edges in MU, Mm, UU, and Um categories).
  • Figure 5: Overview of the Topological Wedge-based mixup
  • ...and 3 more figures

Theorems & Definitions (2)

  • Definition 1
  • Definition 2