Table of Contents
Fetching ...

Active Learning for Graphs with Noisy Structures

Hongliang Chi, Cong Qi, Suhang Wang, Yao Ma

TL;DR

This paper tackles active learning on graphs with noisy structures by proposing GALClean, an iterative framework that jointly performs data selection and graph cleaning. Representations are learned in a decoupled, noise-robust manner, while a cleanliness-aware node selection strategy and an edge-predictor-based graph cleaning mechanism progressively purify the graph. The authors frame GALClean as a stochastic EM algorithm, with GALClean+ extending the approach by running additional EM iterations after labeling budget exhaustion to further improve graph quality. Extensive experiments across diverse datasets show substantial robustness gains against noise and attacks, underscoring the practical value of jointly optimizing data labeling and graph structure purification in noisy real-world graphs.

Abstract

Graph Neural Networks (GNNs) have seen significant success in tasks such as node classification, largely contingent upon the availability of sufficient labeled nodes. Yet, the excessive cost of labeling large-scale graphs led to a focus on active learning on graphs, which aims for effective data selection to maximize downstream model performance. Notably, most existing methods assume reliable graph topology, while real-world scenarios often present noisy graphs. Given this, designing a successful active learning framework for noisy graphs is highly needed but challenging, as selecting data for labeling and obtaining a clean graph are two tasks naturally interdependent: selecting high-quality data requires clean graph structure while cleaning noisy graph structure requires sufficient labeled data. Considering the complexity mentioned above, we propose an active learning framework, GALClean, which has been specifically designed to adopt an iterative approach for conducting both data selection and graph purification simultaneously with best information learned from the prior iteration. Importantly, we summarize GALClean as an instance of the Expectation-Maximization algorithm, which provides a theoretical understanding of its design and mechanisms. This theory naturally leads to an enhanced version, GALClean+. Extensive experiments have demonstrated the effectiveness and robustness of our proposed method across various types and levels of noisy graphs.

Active Learning for Graphs with Noisy Structures

TL;DR

This paper tackles active learning on graphs with noisy structures by proposing GALClean, an iterative framework that jointly performs data selection and graph cleaning. Representations are learned in a decoupled, noise-robust manner, while a cleanliness-aware node selection strategy and an edge-predictor-based graph cleaning mechanism progressively purify the graph. The authors frame GALClean as a stochastic EM algorithm, with GALClean+ extending the approach by running additional EM iterations after labeling budget exhaustion to further improve graph quality. Extensive experiments across diverse datasets show substantial robustness gains against noise and attacks, underscoring the practical value of jointly optimizing data labeling and graph structure purification in noisy real-world graphs.

Abstract

Graph Neural Networks (GNNs) have seen significant success in tasks such as node classification, largely contingent upon the availability of sufficient labeled nodes. Yet, the excessive cost of labeling large-scale graphs led to a focus on active learning on graphs, which aims for effective data selection to maximize downstream model performance. Notably, most existing methods assume reliable graph topology, while real-world scenarios often present noisy graphs. Given this, designing a successful active learning framework for noisy graphs is highly needed but challenging, as selecting data for labeling and obtaining a clean graph are two tasks naturally interdependent: selecting high-quality data requires clean graph structure while cleaning noisy graph structure requires sufficient labeled data. Considering the complexity mentioned above, we propose an active learning framework, GALClean, which has been specifically designed to adopt an iterative approach for conducting both data selection and graph purification simultaneously with best information learned from the prior iteration. Importantly, we summarize GALClean as an instance of the Expectation-Maximization algorithm, which provides a theoretical understanding of its design and mechanisms. This theory naturally leads to an enhanced version, GALClean+. Extensive experiments have demonstrated the effectiveness and robustness of our proposed method across various types and levels of noisy graphs.
Paper Structure (28 sections, 18 equations, 4 figures, 4 tables)

This paper contains 28 sections, 18 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Overall framework of GALClean.
  • Figure 2: Active Learning Performance under Random Edge-Adding Attacks
  • Figure 3: Active Learning Performance under Unsupervised Adversarial Attacks
  • Figure 4: Parameter sensitivity analysis for $\kappa$