Table of Contents
Fetching ...

Active Learning for Graph Neural Networks via Node Feature Propagation

Yuexin Wu, Yichong Xu, Aarti Singh, Yiming Yang, Artur Dubrawski

TL;DR

This paper tackles label-efficient node classification on graphs by introducing FeatProp, a strategy that selects training nodes using node feature propagation through the graph followed by K-Medoids clustering. It provides a theoretical bound linking the expected loss to the geometry of propagated features and demonstrates consistent empirical gains over strong baselines across multiple benchmark graphs. The approach is robust to under-trained representations and avoids reliance on final-layer embeddings, offering practical benefits for scenarios with limited labeling budgets. Overall, FeatProp advances active learning for graph neural networks by marrying propagation-based representations with principled clustering, yielding improved performance and efficiency.

Abstract

Graph Neural Networks (GNNs) for prediction tasks like node classification or edge prediction have received increasing attention in recent machine learning from graphically structured data. However, a large quantity of labeled graphs is difficult to obtain, which significantly limits the true success of GNNs. Although active learning has been widely studied for addressing label-sparse issues with other data types like text, images, etc., how to make it effective over graphs is an open question for research. In this paper, we present an investigation on active learning with GNNs for node classification tasks. Specifically, we propose a new method, which uses node feature propagation followed by K-Medoids clustering of the nodes for instance selection in active learning. With a theoretical bound analysis we justify the design choice of our approach. In our experiments on four benchmark datasets, the proposed method outperforms other representative baseline methods consistently and significantly.

Active Learning for Graph Neural Networks via Node Feature Propagation

TL;DR

This paper tackles label-efficient node classification on graphs by introducing FeatProp, a strategy that selects training nodes using node feature propagation through the graph followed by K-Medoids clustering. It provides a theoretical bound linking the expected loss to the geometry of propagated features and demonstrates consistent empirical gains over strong baselines across multiple benchmark graphs. The approach is robust to under-trained representations and avoids reliance on final-layer embeddings, offering practical benefits for scenarios with limited labeling budgets. Overall, FeatProp advances active learning for graph neural networks by marrying propagation-based representations with principled clustering, yielding improved performance and efficiency.

Abstract

Graph Neural Networks (GNNs) for prediction tasks like node classification or edge prediction have received increasing attention in recent machine learning from graphically structured data. However, a large quantity of labeled graphs is difficult to obtain, which significantly limits the true success of GNNs. Although active learning has been widely studied for addressing label-sparse issues with other data types like text, images, etc., how to make it effective over graphs is an open question for research. In this paper, we present an investigation on active learning with GNNs for node classification tasks. Specifically, we propose a new method, which uses node feature propagation followed by K-Medoids clustering of the nodes for instance selection in active learning. With a theoretical bound analysis we justify the design choice of our approach. In our experiments on four benchmark datasets, the proposed method outperforms other representative baseline methods consistently and significantly.

Paper Structure

This paper contains 16 sections, 3 theorems, 23 equations, 5 figures, 3 tables, 1 algorithm.

Key Result

Theorem 1

Suppose that the label vector $Y$ is sampled independently from the distribution $y_i\sim \eta(i)$, and the loss function $l$ is bounded by $[-L,L]$. Then under mild assumptions, there exists a constant $c_0$ such that with probability $1-\delta$ the expected classification loss of ${\mathcal{A}}_t$

Figures (5)

  • Figure 1: Visualization of Theorem \ref{['th:main']}. Consider the set of selected points $\mathbf{s}^{}$ and the remaining points in the dataset $[n]\backslash \mathbf{s}^{}$. K-Medoids corresponds to the mean of all red segments in the figure, whereas K-Center corresponds to the max of all red segments in the figure.
  • Figure 2: Results of different approaches over benchmark datasets averaged from 5 different runs.
  • Figure 3: Results of different approaches over benchmark datasets averaged from 5 different runs. Similar to Coreset, the orange line denotes replacing the original distance function in Eqn. (\ref{['eqn:dist']}) with L2 distance from the final GCN layer. The blue line denotes the algorithm replacing the K-Medoids module with K-Center clustering.
  • Figure 4: Results of different approaches over benchmark datasets averaged from 5 different runs on an SGC framework.
  • Figure 5: Results of SGC vs GCN over benchmark datasets averaged from 5 different runs by using FeatProp.

Theorems & Definitions (4)

  • Theorem 1: informal
  • Theorem 2
  • proof
  • Theorem 3: Hoeffding's Inequality, hoeffding1994probability