Table of Contents
Fetching ...

BuffGraph: Enhancing Class-Imbalanced Node Classification via Buffer Nodes

Qian Wang, Zemin Liu, Zhen Zhang, Bingsheng He

TL;DR

This work tackles the challenge of class-imbalanced node classification in graphs by addressing heterophily through buffer nodes that interpose along every edge, slowing and modulating information flow to balance minority and majority signals. BuffGraph introduces buffer-node generation via mixup, an edge-heterophily guided dynamic message passing framework with a joint loss L_{total} = L_{pred} + λ L_{hetero}, and a spectral perspective showing how buffering reshapes diffusion via the graph Laplacian. Empirically, BuffGraph consistently outperforms strong baselines on five real-world datasets in both natural and artificially imbalanced settings, achieving notable gains in accuracy and macro F1 for minority classes, and it scales linearly with graph size in scalability tests. The results highlight the practical impact of edge-aware buffering for robust, heterophily-tolerant graph learning in imbalanced scenarios, with theoretical and ablation analyses supporting the contributions and suggesting avenues for future exploration of heterophily-aware graph augmentation.

Abstract

Class imbalance in graph-structured data, where minor classes are significantly underrepresented, poses a critical challenge for Graph Neural Networks (GNNs). To address this challenge, existing studies generally generate new minority nodes and edges connecting new nodes to the original graph to make classes balanced. However, they do not solve the problem that majority classes still propagate information to minority nodes by edges in the original graph which introduces bias towards majority classes. To address this, we introduce BuffGraph, which inserts buffer nodes into the graph, modulating the impact of majority classes to improve minor class representation. Our extensive experiments across diverse real-world datasets empirically demonstrate that BuffGraph outperforms existing baseline methods in class-imbalanced node classification in both natural settings and imbalanced settings. Code is available at https://anonymous.4open.science/r/BuffGraph-730A.

BuffGraph: Enhancing Class-Imbalanced Node Classification via Buffer Nodes

TL;DR

This work tackles the challenge of class-imbalanced node classification in graphs by addressing heterophily through buffer nodes that interpose along every edge, slowing and modulating information flow to balance minority and majority signals. BuffGraph introduces buffer-node generation via mixup, an edge-heterophily guided dynamic message passing framework with a joint loss L_{total} = L_{pred} + λ L_{hetero}, and a spectral perspective showing how buffering reshapes diffusion via the graph Laplacian. Empirically, BuffGraph consistently outperforms strong baselines on five real-world datasets in both natural and artificially imbalanced settings, achieving notable gains in accuracy and macro F1 for minority classes, and it scales linearly with graph size in scalability tests. The results highlight the practical impact of edge-aware buffering for robust, heterophily-tolerant graph learning in imbalanced scenarios, with theoretical and ablation analyses supporting the contributions and suggesting avenues for future exploration of heterophily-aware graph augmentation.

Abstract

Class imbalance in graph-structured data, where minor classes are significantly underrepresented, poses a critical challenge for Graph Neural Networks (GNNs). To address this challenge, existing studies generally generate new minority nodes and edges connecting new nodes to the original graph to make classes balanced. However, they do not solve the problem that majority classes still propagate information to minority nodes by edges in the original graph which introduces bias towards majority classes. To address this, we introduce BuffGraph, which inserts buffer nodes into the graph, modulating the impact of majority classes to improve minor class representation. Our extensive experiments across diverse real-world datasets empirically demonstrate that BuffGraph outperforms existing baseline methods in class-imbalanced node classification in both natural settings and imbalanced settings. Code is available at https://anonymous.4open.science/r/BuffGraph-730A.
Paper Structure (24 sections, 10 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 24 sections, 10 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: Comparative analysis of class-wise accuracy improvements before and after node insertion for the Coauthor-CS (left) and WikiCS (right) datasets. Classes are organized in descending order according to the number of samples per class. The heterophily score depicted corresponds to the average heterophily score across samples within each class, offering insights into the impact of node insertion on class-wise performance considering the underlying heterophily dynamics.
  • Figure 2: BuffGraph overview where $v_1$, $v_2$, $v_4$, $v_5$ are of the major class and $v_3$, $v_6$ are of the minor class. The input graph is shown in (a). Subsequently, we introduce a buffer node into every edge within the graph, as depicted in (b). The feature of the buffer node is a blend of the features from the two nodes connected by the edge, weighted by $\alpha$ and $1 - \alpha$ respectively. We zoom in $v_4$ to show the neighbor aggregation of BuffGraph in (c). Each neighbor node passes message both across the buffer node and directly to $v_4$ at the same time based on the edge's heterophily extent. The loss function is calculated when doing the neighbor aggregation in BuffGraph shown in (d).
  • Figure 3: Trend of BAcc. with the increase of imbalance ratio on Amazon-Computers.
  • Figure 4: Scalability study on Coauthor-Physics.
  • Figure 5: Ablation study on Amazon-Computers and WikiCS.
  • ...and 1 more figures