Table of Contents
Fetching ...

Resurrecting Label Propagation for Graphs with Heterophily and Label Noise

Yao Cheng, Caihua Shan, Yifei Shen, Xiang Li, Siqiang Luo, Dongsheng Li

TL;DR

This work tackles graph label noise under arbitrary heterophily by showing that graph homophily mitigates label noise and that LP-based methods excel as homophily rises. It introduces $R^{2}$LP, an iterative framework that reconstructs a homophilous graph ($Z^{(L)*}$), rectifies noisy labels via label propagation, and augments the clean set through high-confidence selections, with a linear-time acceleration and theoretical denoising guarantees. Empirically, $R^{2}$LP consistently outperforms 10 baselines across 10 datasets with varying noise and heterophily levels, and ablations confirm the importance of each component. The method offers a scalable, robust solution for semi-supervised node classification in noisy, heterophilous graphs, with potential applicability to real-world networks where label noise and heterophily co-occur.

Abstract

Label noise is a common challenge in large datasets, as it can significantly degrade the generalization ability of deep neural networks. Most existing studies focus on noisy labels in computer vision; however, graph models encompass both node features and graph topology as input, and become more susceptible to label noise through message-passing mechanisms. Recently, only a few works have been proposed to tackle the label noise on graphs. One significant limitation is that they operate under the assumption that the graph exhibits homophily and that the labels are distributed smoothly. However, real-world graphs can exhibit varying degrees of heterophily, or even be dominated by heterophily, which results in the inadequacy of the current methods. In this paper, we study graph label noise in the context of arbitrary heterophily, with the aim of rectifying noisy labels and assigning labels to previously unlabeled nodes. We begin by conducting two empirical analyses to explore the impact of graph homophily on graph label noise. Following observations, we propose a efficient algorithm, denoted as $R^{2}LP$. Specifically, $R^{2}LP$ is an iterative algorithm with three steps: (1) reconstruct the graph to recover the homophily property, (2) utilize label propagation to rectify the noisy labels, (3) select high-confidence labels to retain for the next iteration. By iterating these steps, we obtain a set of correct labels, ultimately achieving high accuracy in the node classification task. The theoretical analysis is also provided to demonstrate its remarkable denoising effect. Finally, we perform experiments on ten benchmark datasets with different levels of graph heterophily and various types of noise. In these experiments, we compare the performance of $R^{2}LP$ against ten typical baseline methods. Our results illustrate the superior performance of the proposed $R^{2}LP$.

Resurrecting Label Propagation for Graphs with Heterophily and Label Noise

TL;DR

This work tackles graph label noise under arbitrary heterophily by showing that graph homophily mitigates label noise and that LP-based methods excel as homophily rises. It introduces LP, an iterative framework that reconstructs a homophilous graph (), rectifies noisy labels via label propagation, and augments the clean set through high-confidence selections, with a linear-time acceleration and theoretical denoising guarantees. Empirically, LP consistently outperforms 10 baselines across 10 datasets with varying noise and heterophily levels, and ablations confirm the importance of each component. The method offers a scalable, robust solution for semi-supervised node classification in noisy, heterophilous graphs, with potential applicability to real-world networks where label noise and heterophily co-occur.

Abstract

Label noise is a common challenge in large datasets, as it can significantly degrade the generalization ability of deep neural networks. Most existing studies focus on noisy labels in computer vision; however, graph models encompass both node features and graph topology as input, and become more susceptible to label noise through message-passing mechanisms. Recently, only a few works have been proposed to tackle the label noise on graphs. One significant limitation is that they operate under the assumption that the graph exhibits homophily and that the labels are distributed smoothly. However, real-world graphs can exhibit varying degrees of heterophily, or even be dominated by heterophily, which results in the inadequacy of the current methods. In this paper, we study graph label noise in the context of arbitrary heterophily, with the aim of rectifying noisy labels and assigning labels to previously unlabeled nodes. We begin by conducting two empirical analyses to explore the impact of graph homophily on graph label noise. Following observations, we propose a efficient algorithm, denoted as . Specifically, is an iterative algorithm with three steps: (1) reconstruct the graph to recover the homophily property, (2) utilize label propagation to rectify the noisy labels, (3) select high-confidence labels to retain for the next iteration. By iterating these steps, we obtain a set of correct labels, ultimately achieving high accuracy in the node classification task. The theoretical analysis is also provided to demonstrate its remarkable denoising effect. Finally, we perform experiments on ten benchmark datasets with different levels of graph heterophily and various types of noise. In these experiments, we compare the performance of against ten typical baseline methods. Our results illustrate the superior performance of the proposed .
Paper Structure (35 sections, 6 theorems, 27 equations, 7 figures, 8 tables, 1 algorithm)

This paper contains 35 sections, 6 theorems, 27 equations, 7 figures, 8 tables, 1 algorithm.

Key Result

Theorem 1

(Label Propagation and Denoising) Suppose the label noise is generated by $\bm{T}$ and the label propagation follows Equation eq:lp. For a specific node $i$, we further assume the node has $d$ neighbors, and its neighbor nodes have the probability $p$ to have the same true label with node $i$, i.e., where we have

Figures (7)

  • Figure 1: The classification accuracy with flip label noise ranging from 0% to 60% on Cornell and Actor datasets.
  • Figure 2: The impact of edge homophily on graph label noise across various methods (flip noise)
  • Figure 3: The performance of LPM and NRGNN incorporated with three graph reconstruction modules
  • Figure 4: Ablation study on the main components of $R^{2}$LP.
  • Figure 5: The impact of edge homophily on graph label noise across various methods (uniform noise)
  • ...and 2 more figures

Theorems & Definitions (10)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • proof
  • Theorem 4
  • proof
  • Theorem
  • proof
  • Theorem
  • proof