Table of Contents
Fetching ...

Topology-Aware Active Learning on Graphs

Harris Hardiman-Mostow, Jack Mauro, Adrien Weihs, Andrea L. Bertozzi

TL;DR

This work targets label-efficient learning on graphs by leveraging topology through Balanced Forman Curvature (BFC) and multiscale graph Laplacian regularization. It introduces Curvature Coreset (CC) to select a diverse initial labeled set and a data-driven stopping signal, plus a curvature-based mechanism to switch from exploration to exploitation within PWLL-$\tau$. It further proposes a localized graph rewiring strategy to incorporate multiscale information around labeled nodes, significantly improving label propagation while preserving sparsity. Across benchmarks, CC and curvature-driven PWLL-$\tau$ show strong improvements at low label rates, and localized rewiring achieves substantial accuracy gains with orders-of-magnitude speedups over full multiscale methods, highlighting practical gains for graph-based active learning.

Abstract

We propose a graph-topological approach to active learning that directly targets the core challenge of exploration versus exploitation under scarce label budgets. To guide exploration, we introduce a coreset construction algorithm based on Balanced Forman Curvature (BFC), which selects representative initial labels that reflect the graph's cluster structure. This method includes a data-driven stopping criterion that signals when the graph has been sufficiently explored. We further use BFC to dynamically trigger the shift from exploration to exploitation within active learning routines, replacing hand-tuned heuristics. To improve exploitation, we introduce a localized graph rewiring strategy that efficiently incorporates multiscale information around labeled nodes, enhancing label propagation while preserving sparsity. Experiments on benchmark classification tasks show that our methods consistently outperform existing graph-based semi-supervised baselines at low label rates.

Topology-Aware Active Learning on Graphs

TL;DR

This work targets label-efficient learning on graphs by leveraging topology through Balanced Forman Curvature (BFC) and multiscale graph Laplacian regularization. It introduces Curvature Coreset (CC) to select a diverse initial labeled set and a data-driven stopping signal, plus a curvature-based mechanism to switch from exploration to exploitation within PWLL-. It further proposes a localized graph rewiring strategy to incorporate multiscale information around labeled nodes, significantly improving label propagation while preserving sparsity. Across benchmarks, CC and curvature-driven PWLL- show strong improvements at low label rates, and localized rewiring achieves substantial accuracy gains with orders-of-magnitude speedups over full multiscale methods, highlighting practical gains for graph-based active learning.

Abstract

We propose a graph-topological approach to active learning that directly targets the core challenge of exploration versus exploitation under scarce label budgets. To guide exploration, we introduce a coreset construction algorithm based on Balanced Forman Curvature (BFC), which selects representative initial labels that reflect the graph's cluster structure. This method includes a data-driven stopping criterion that signals when the graph has been sufficiently explored. We further use BFC to dynamically trigger the shift from exploration to exploitation within active learning routines, replacing hand-tuned heuristics. To improve exploitation, we introduce a localized graph rewiring strategy that efficiently incorporates multiscale information around labeled nodes, enhancing label propagation while preserving sparsity. Experiments on benchmark classification tasks show that our methods consistently outperform existing graph-based semi-supervised baselines at low label rates.

Paper Structure

This paper contains 24 sections, 23 equations, 8 figures, 4 tables, 5 algorithms.

Figures (8)

  • Figure 1: From weihs2025Hypergraphs: higher-order smoothness is imposed on the labeling function $v$ in denser regions, while allowing greater flexibility in sparser areas.
  • Figure 2: Coreset points chosen by CC and DAC at different iterations of coreset selection on the Blobs dataset. CC selects exactly one point from each of the eight clusters by the eighth iteration - the most efficient exploration of the cluster structure of the dataset possible. Moreover, they are all toward the center of each cluster (not outliers) due to the $d_i$ terms in $\text{Ric}(i,j)$. Conversely, DAC is inefficient, needing over twice as many iterations to sample from every cluster, and often sampling near the edge of the dataset.
  • Figure 3: Illustration of our proposed stopping condition on MNIST. Across datasets, the value of $c$ in Algorithm \ref{['alg:curvature']} steadily grows until a certain "saturation" point when it starts rapidly increasing. This indicates - according to the curvature metric - that the graph topology has been sufficiently explored, and exploitative active learning may begin. The red star indicates when the online Z-score stopping condition triggers (Algorithm \ref{['alg:curvature_stopping']}), corresponding to the first large jump in curvature values among coreset points.
  • Figure 4: Coreset and AL results for Curvature, DAC, and Random on several benchmarks. The left and right columns present results when AL begins at 50 and 100 labels, respectively. The solid line indicates the mean and shaded region indicates one standard deviation over 10 trials. Our method significantly outperforms the others, especially at the lower label rates.
  • Figure 5: Coreset and AL accuracy comparison between our method and DAC (with different radii), where we use each method's stopping condition. The like-colored dashed line indicates where the stopping condition is triggered (and AL begins) for each method. Across datasets, and regardless of when stopping conditions are triggered, our method significantly outperforms DAC, particularly at lower label rates.
  • ...and 3 more figures

Theorems & Definitions (6)

  • Definition 2.1: Neighborhoods of $x_i \sim x_j$
  • Definition 2.2: Balanced Forman Curvature
  • Remark 3.1: Contrast to GNN Applications
  • Remark 3.2: Adjacency Matrix
  • Remark 3.3: Reduction Parameter
  • Remark 4.1: Presenting a Fair Comparison to DAC