Table of Contents
Fetching ...

Mitigating Overfitting in Graph Neural Networks via Feature and Hyperplane Perturbation

Yoonhyuk Choi, Jiho Choi, Taewook Ko, Chong-Kwon Kim

TL;DR

A novel perturbation technique is proposed that introduces variability to the initial features and the projection hyperplane of message-passing neural networks and significantly enhances node classification accuracy in semi-supervised scenarios.

Abstract

Graph neural networks (GNNs) are commonly used in semi-supervised settings. Previous research has primarily focused on finding appropriate graph filters (e.g. aggregation methods) to perform well on both homophilic and heterophilic graphs. While these methods are effective, they can still suffer from the sparsity of node features, where the initial data contain few non-zero elements. This can lead to overfitting in certain dimensions in the first projection matrix, as training samples may not cover the entire range of graph filters (hyperplanes). To address this, we propose a novel data augmentation strategy. Specifically, by flipping both the initial features and hyperplane, we create additional space for training, which leads to more precise updates of the learnable parameters and improved robustness for unseen features during inference. To the best of our knowledge, this is the first attempt to mitigate the overfitting caused by the initial features. Extensive experiments on real-world datasets show that our proposed technique increases node classification accuracy by up to 46.5% relatively.

Mitigating Overfitting in Graph Neural Networks via Feature and Hyperplane Perturbation

TL;DR

A novel perturbation technique is proposed that introduces variability to the initial features and the projection hyperplane of message-passing neural networks and significantly enhances node classification accuracy in semi-supervised scenarios.

Abstract

Graph neural networks (GNNs) are commonly used in semi-supervised settings. Previous research has primarily focused on finding appropriate graph filters (e.g. aggregation methods) to perform well on both homophilic and heterophilic graphs. While these methods are effective, they can still suffer from the sparsity of node features, where the initial data contain few non-zero elements. This can lead to overfitting in certain dimensions in the first projection matrix, as training samples may not cover the entire range of graph filters (hyperplanes). To address this, we propose a novel data augmentation strategy. Specifically, by flipping both the initial features and hyperplane, we create additional space for training, which leads to more precise updates of the learnable parameters and improved robustness for unseen features during inference. To the best of our knowledge, this is the first attempt to mitigate the overfitting caused by the initial features. Extensive experiments on real-world datasets show that our proposed technique increases node classification accuracy by up to 46.5% relatively.
Paper Structure (21 sections, 18 equations, 5 figures, 5 tables)

This paper contains 21 sections, 18 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: The colored area in $W$ indicates that the gradient is not updated during back-propagation, as it depends on whether the input element $X$ is zero. Consequently, the parameter is optimized only for the x-axis during training
  • Figure 2: Translation invariance of convolutional networks. The white square represents the sliding convolution filter, while the blue squares indicate non-zero outputs
  • Figure 3: The figure above illustrates (a) the method of shifting the first matrix $W_1$ and (b) the overall architecture of Shift-GNN. As shown, the parameters are shared and updated iteratively in both spaces
  • Figure 4: (RQ2) Node classification accuracy of MLP, GCN, GAT, and their shifted versions as functions of the parameter $\varepsilon$ on the Cora dataset. In the left figure, we shift only the features, whereas the right figure shows the performance when both the features and the first plane are shifted
  • Figure 5: (RQ4) Using the Cora dataset, we plot the magnitude of the first projection matrix gradients and their standard deviation ($\sigma$) during training epochs ($i$)