Table of Contents
Fetching ...

Graph Neural Networks with Coarse- and Fine-Grained Division for Mitigating Label Sparsity and Noise

Shuangjie Li, Baoming Zhang, Jianqing Song, Gaoli Ruan, Chongjun Wang, Junyuan Xie

TL;DR

This work investigates the effectiveness of linking unlabeled nodes to cleanly labeled nodes, and proposes a clean labels oriented link that connects unlabeled nodes to cleanly labeled nodes, aimed at mitigating label sparsity and promoting supervision propagation.

Abstract

Graph Neural Networks (GNNs) have gained considerable prominence in semi-supervised learning tasks in processing graph-structured data, primarily owing to their message-passing mechanism, which largely relies on the availability of clean labels. However, in real-world scenarios, labels on nodes of graphs are inevitably noisy and sparsely labeled, significantly degrading the performance of GNNs. Exploring robust GNNs for semi-supervised node classification in the presence of noisy and sparse labels remains a critical challenge. Therefore, we propose a novel \textbf{G}raph \textbf{N}eural \textbf{N}etwork with \textbf{C}oarse- and \textbf{F}ine-\textbf{G}rained \textbf{D}ivision for mitigating label sparsity and noise, namely GNN-CFGD. The key idea of GNN-CFGD is reducing the negative impact of noisy labels via coarse- and fine-grained division, along with graph reconstruction. Specifically, we first investigate the effectiveness of linking unlabeled nodes to cleanly labeled nodes, demonstrating that this approach is more effective in combating labeling noise than linking to potentially noisy labeled nodes. Based on this observation, we introduce a Gaussian Mixture Model (GMM) based on the memory effect to perform a coarse-grained division of the given labels into clean and noisy labels. Next, we propose a clean labels oriented link that connects unlabeled nodes to cleanly labeled nodes, aimed at mitigating label sparsity and promoting supervision propagation. Furthermore, to provide refined supervision for noisy labeled nodes and additional supervision for unlabeled nodes, we fine-grain the noisy labeled and unlabeled nodes into two candidate sets based on confidence, respectively. Extensive experiments on various datasets demonstrate the superior effectiveness and robustness of GNN-CFGD.

Graph Neural Networks with Coarse- and Fine-Grained Division for Mitigating Label Sparsity and Noise

TL;DR

This work investigates the effectiveness of linking unlabeled nodes to cleanly labeled nodes, and proposes a clean labels oriented link that connects unlabeled nodes to cleanly labeled nodes, aimed at mitigating label sparsity and promoting supervision propagation.

Abstract

Graph Neural Networks (GNNs) have gained considerable prominence in semi-supervised learning tasks in processing graph-structured data, primarily owing to their message-passing mechanism, which largely relies on the availability of clean labels. However, in real-world scenarios, labels on nodes of graphs are inevitably noisy and sparsely labeled, significantly degrading the performance of GNNs. Exploring robust GNNs for semi-supervised node classification in the presence of noisy and sparse labels remains a critical challenge. Therefore, we propose a novel \textbf{G}raph \textbf{N}eural \textbf{N}etwork with \textbf{C}oarse- and \textbf{F}ine-\textbf{G}rained \textbf{D}ivision for mitigating label sparsity and noise, namely GNN-CFGD. The key idea of GNN-CFGD is reducing the negative impact of noisy labels via coarse- and fine-grained division, along with graph reconstruction. Specifically, we first investigate the effectiveness of linking unlabeled nodes to cleanly labeled nodes, demonstrating that this approach is more effective in combating labeling noise than linking to potentially noisy labeled nodes. Based on this observation, we introduce a Gaussian Mixture Model (GMM) based on the memory effect to perform a coarse-grained division of the given labels into clean and noisy labels. Next, we propose a clean labels oriented link that connects unlabeled nodes to cleanly labeled nodes, aimed at mitigating label sparsity and promoting supervision propagation. Furthermore, to provide refined supervision for noisy labeled nodes and additional supervision for unlabeled nodes, we fine-grain the noisy labeled and unlabeled nodes into two candidate sets based on confidence, respectively. Extensive experiments on various datasets demonstrate the superior effectiveness and robustness of GNN-CFGD.

Paper Structure

This paper contains 25 sections, 22 equations, 10 figures, 2 tables, 1 algorithm.

Figures (10)

  • Figure 1: The illustration of message-passing for unlabeled node. (a) linkL (such as NRGNN dai2021nrgnn and RTGNN qian2023robust): Directly linking unlabeled nodes to labeled ones may inadvertently connect to noisy labeled nodes; (b) linkCL: Linking unlabeled nodes only to clean labeled nodes. We use unlabeled nodes $v_4$ as an example.
  • Figure 2: The performance of GCN in node classification with different label noise rates on the Citeseer dataset with the setting: 1) for linkL, we connect each unlabeled node to the top 50 labeled nodes that are most similar to it based on their features, and 2) for linkCL, we connect each unlabeled node only to the clean nodes of top 50 labeled nodes that are most similar to it based on their features.
  • Figure 3: An illustration of the proposed model GNN-CFGD, which consists of (i) coarse-grained division based on GMM, (ii) clean labels oriented link, and (iii) fine-grained division based on confidence.
  • Figure 4: Results with different label rates on Cora-ML dataset.
  • Figure 5: Results with different label rates on CiteSeer dataset.
  • ...and 5 more figures