Table of Contents
Fetching ...

Training a Label-Noise-Resistant GNN with Reduced Complexity

Rui Zhao, Bin Shi, Zhiming Liang, Jianfei Ruan, Bo Dong, Lu Lin

TL;DR

The Label Ensemble Graph Neural Network (LEGNN) is introduced, a lower complexity method for robust GNNs training against label noise that reframes NCLN as a label ensemble task, gathering informative multiple labels instead of constructing a single reliable label, avoiding high-complexity computations for reliability assessment.

Abstract

Graph Neural Networks (GNNs) have been widely employed for semi-supervised node classification tasks on graphs. However, the performance of GNNs is significantly affected by label noise, that is, a small amount of incorrectly labeled nodes can substantially misguide model training. Mainstream solutions define node classification with label noise (NCLN) as a reliable labeling task, often introducing node similarity with quadratic computational complexity to more accurately assess label reliability. To this end, in this paper, we introduce the Label Ensemble Graph Neural Network (LEGNN), a lower complexity method for robust GNNs training against label noise. LEGNN reframes NCLN as a label ensemble task, gathering informative multiple labels instead of constructing a single reliable label, avoiding high-complexity computations for reliability assessment. Specifically, LEGNN conducts a two-step process: bootstrapping neighboring contexts and robust learning with gathered multiple labels. In the former step, we apply random neighbor masks for each node and gather the predicted labels as a high-probability label set. This mitigates the impact of inaccurately labeled neighbors and diversifies the label set. In the latter step, we utilize a partial label learning based strategy to aggregate the high-probability label information for model training. Additionally, we symmetrically gather a low-probability label set to counteract potential noise from the bootstrapped high-probability label set. Extensive experiments on six datasets demonstrate that LEGNN achieves outstanding performance while ensuring efficiency. Moreover, it exhibits good scalability on dataset with over one hundred thousand nodes and one million edges.

Training a Label-Noise-Resistant GNN with Reduced Complexity

TL;DR

The Label Ensemble Graph Neural Network (LEGNN) is introduced, a lower complexity method for robust GNNs training against label noise that reframes NCLN as a label ensemble task, gathering informative multiple labels instead of constructing a single reliable label, avoiding high-complexity computations for reliability assessment.

Abstract

Graph Neural Networks (GNNs) have been widely employed for semi-supervised node classification tasks on graphs. However, the performance of GNNs is significantly affected by label noise, that is, a small amount of incorrectly labeled nodes can substantially misguide model training. Mainstream solutions define node classification with label noise (NCLN) as a reliable labeling task, often introducing node similarity with quadratic computational complexity to more accurately assess label reliability. To this end, in this paper, we introduce the Label Ensemble Graph Neural Network (LEGNN), a lower complexity method for robust GNNs training against label noise. LEGNN reframes NCLN as a label ensemble task, gathering informative multiple labels instead of constructing a single reliable label, avoiding high-complexity computations for reliability assessment. Specifically, LEGNN conducts a two-step process: bootstrapping neighboring contexts and robust learning with gathered multiple labels. In the former step, we apply random neighbor masks for each node and gather the predicted labels as a high-probability label set. This mitigates the impact of inaccurately labeled neighbors and diversifies the label set. In the latter step, we utilize a partial label learning based strategy to aggregate the high-probability label information for model training. Additionally, we symmetrically gather a low-probability label set to counteract potential noise from the bootstrapped high-probability label set. Extensive experiments on six datasets demonstrate that LEGNN achieves outstanding performance while ensuring efficiency. Moreover, it exhibits good scalability on dataset with over one hundred thousand nodes and one million edges.

Paper Structure

This paper contains 22 sections, 14 equations, 4 figures, 11 tables, 3 algorithms.

Figures (4)

  • Figure 1: A comparison between our proposed label ensemble method and existing reliable labeling methods. The colors of the nodes represent their labels, while the symbols "✓" and "×" indicate whether the nodes are correctly or incorrectly labeled, respectively. Intuitively, our method avoids directly mistaking erroneous labels as correct by label ensemble. The experimental details of (b) are inline with Section \ref{['sec:em']}.
  • Figure 2: The overall framework of LEGNN. The nodes are colored differently to indicate their distinct labels, with white representing target nodes to be labeled. LEGNN first generates a masked graph by bootstrapping the neighbor context. It then infers label probabilities on the masked graph to identify high-probability and low-probability label sets for the target nodes. Finally, robust model training is conducted using the PLL based strategy with these two label sets.
  • Figure 3: The classification accuracy decreases as the number of labels decreases. Experiments were conducted on the Citeseer dataset, with the left figure representing the Sym-50% noise and the right figure representing the Pair-40% noise.
  • Figure 4: Classification accuracy (%) with different parameters on Citeseer.