Table of Contents
Fetching ...

Graffin: Stand for Tails in Imbalanced Node Classification

Xiaorui Qi, Yanlong Wen, Xiaojie Yuan

TL;DR

Inspired by recurrent neural networks, Graffin flows head features into tail data through graph serialization techniques to alleviate the imbalance of tail representation and shows that Graffin can improve the adaptation to tail data without significantly degrading the overall model performance.

Abstract

Graph representation learning (GRL) models have succeeded in many scenarios. Real-world graphs have imbalanced distribution, such as node labels and degrees, which leaves a critical challenge to GRL. Imbalanced inputs can lead to imbalanced outputs. However, most existing works ignore it and assume that the distribution of input graphs is balanced, which cannot align with real situations, resulting in worse model performance on tail data. The domination of head data makes tail data underrepresented when training graph neural networks (GNNs). Thus, we propose Graffin, a pluggable tail data augmentation module, to address the above issues. Inspired by recurrent neural networks (RNNs), Graffin flows head features into tail data through graph serialization techniques to alleviate the imbalance of tail representation. The local and global structures are fused to form the node representation under the combined effect of neighborhood and sequence information, which enriches the semantics of tail data. We validate the performance of Graffin on four real-world datasets in node classification tasks. Results show that Graffin can improve the adaptation to tail data without significantly degrading the overall model performance.

Graffin: Stand for Tails in Imbalanced Node Classification

TL;DR

Inspired by recurrent neural networks, Graffin flows head features into tail data through graph serialization techniques to alleviate the imbalance of tail representation and shows that Graffin can improve the adaptation to tail data without significantly degrading the overall model performance.

Abstract

Graph representation learning (GRL) models have succeeded in many scenarios. Real-world graphs have imbalanced distribution, such as node labels and degrees, which leaves a critical challenge to GRL. Imbalanced inputs can lead to imbalanced outputs. However, most existing works ignore it and assume that the distribution of input graphs is balanced, which cannot align with real situations, resulting in worse model performance on tail data. The domination of head data makes tail data underrepresented when training graph neural networks (GNNs). Thus, we propose Graffin, a pluggable tail data augmentation module, to address the above issues. Inspired by recurrent neural networks (RNNs), Graffin flows head features into tail data through graph serialization techniques to alleviate the imbalance of tail representation. The local and global structures are fused to form the node representation under the combined effect of neighborhood and sequence information, which enriches the semantics of tail data. We validate the performance of Graffin on four real-world datasets in node classification tasks. Results show that Graffin can improve the adaptation to tail data without significantly degrading the overall model performance.
Paper Structure (26 sections, 15 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 26 sections, 15 equations, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: Imbalanced characteristics in node classification tasks. Top: Imbalanced distribution of node classes. Bottom: Imbalanced classification performance, where H and T denote the results of head and tail data.
  • Figure 2: Overview of the proposed Graffin method. (A) Example input graph with imbalanced node class distribution. (B) Sequential global structure via graph serialization (GS). (C) 1-hop local structure via message passing (MP) neural networks. (D) Example framework with exchangeable MP with Graffin plugged.
  • Figure 3: All classes accuracy w/o Graffin plugged. The class indexes are sorted from head to tail according to the number of nodes in each class.
  • Figure 4: t-SNE visualization of GCN and GCN with Graffin (GCN+Gf) on all four datasets. Each color represents one node class from head data to tail data.