Table of Contents
Fetching ...

Active Few-Shot Learning for Vertex Classification Starting from an Unlabeled Dataset

Felix Burr, Marcel Hoffmann, Ansgar Scherp

TL;DR

This work tackles vertex classification on unlabeled graphs under a fixed annotation budget by introducing an active few-shot learning framework that replaces the unavailable class oracle with k-medoids-based pseudo-labels. It compares discriminative (GCN-based) and prototypical (Graph Prototypical Network) models, showing that prototypical networks are more robust when labeled data are scarce, and that medoid-based sampling yields strong gains across settings. Label propagation leveraging graph homophily further enhances performance, particularly in early iterations, while exact class cardinalities can be estimated with minimal impact on accuracy. The approach enables effective vertex labeling from scratch and provides practical guidance for label-efficient graph learning with public code available for replication.

Abstract

Despite the ample availability of graph data, obtaining vertex labels is a tedious and expensive task. Therefore, it is desirable to learn from a few labeled vertices only. Existing few-shot learners assume a class oracle, which provides labeled vertices for a desired class. However, such an oracle is not available in a real-world setting, i.e., when drawing a vertex for labeling it is unknown to which class the vertex belongs. Few-shot learners are often combined with prototypical networks, while classical semi-supervised vertex classification uses discriminative models, e.g., Graph Convolutional Networks (GCN). In this paper, we train our models by iteratively prompting a human annotator with vertices to annotate. We perform three experiments where we continually relax our assumptions. First, we assume a class oracle, i.e., the human annotator is provided with an equal number of vertices to label for each class. We denote this as "Balanced Sampling''. In the subsequent experiment, "Unbalanced Sampling,'' we replace the class oracle with $k$-medoids clustering and draw vertices to label from the clusters. In the last experiment, the "Unknown Number of Classes,'' we no longer assumed we knew the number and distribution of classes. Our results show that prototypical models outperform discriminative models in all experiments when fewer than $20$ samples per class are available. While dropping the assumption of the class oracle for the "Unbalanced Sampling'' experiment reduces the performance of the GCN by $9\%$, the prototypical network loses only $1\%$ on average. For the "Unknown Number of Classes'' experiment, the average performance for both models decreased further by $1\%$. Source code: https://github.com/Ximsa/2023-felix-ma

Active Few-Shot Learning for Vertex Classification Starting from an Unlabeled Dataset

TL;DR

This work tackles vertex classification on unlabeled graphs under a fixed annotation budget by introducing an active few-shot learning framework that replaces the unavailable class oracle with k-medoids-based pseudo-labels. It compares discriminative (GCN-based) and prototypical (Graph Prototypical Network) models, showing that prototypical networks are more robust when labeled data are scarce, and that medoid-based sampling yields strong gains across settings. Label propagation leveraging graph homophily further enhances performance, particularly in early iterations, while exact class cardinalities can be estimated with minimal impact on accuracy. The approach enables effective vertex labeling from scratch and provides practical guidance for label-efficient graph learning with public code available for replication.

Abstract

Despite the ample availability of graph data, obtaining vertex labels is a tedious and expensive task. Therefore, it is desirable to learn from a few labeled vertices only. Existing few-shot learners assume a class oracle, which provides labeled vertices for a desired class. However, such an oracle is not available in a real-world setting, i.e., when drawing a vertex for labeling it is unknown to which class the vertex belongs. Few-shot learners are often combined with prototypical networks, while classical semi-supervised vertex classification uses discriminative models, e.g., Graph Convolutional Networks (GCN). In this paper, we train our models by iteratively prompting a human annotator with vertices to annotate. We perform three experiments where we continually relax our assumptions. First, we assume a class oracle, i.e., the human annotator is provided with an equal number of vertices to label for each class. We denote this as "Balanced Sampling''. In the subsequent experiment, "Unbalanced Sampling,'' we replace the class oracle with -medoids clustering and draw vertices to label from the clusters. In the last experiment, the "Unknown Number of Classes,'' we no longer assumed we knew the number and distribution of classes. Our results show that prototypical models outperform discriminative models in all experiments when fewer than samples per class are available. While dropping the assumption of the class oracle for the "Unbalanced Sampling'' experiment reduces the performance of the GCN by , the prototypical network loses only on average. For the "Unknown Number of Classes'' experiment, the average performance for both models decreased further by . Source code: https://github.com/Ximsa/2023-felix-ma

Paper Structure

This paper contains 16 sections, 3 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The embedding space of each class partition for all experiments, where each color represents a (potential) class.
  • Figure 2: Test accuracy on Cora, assuming perfect clustering
  • Figure 3: Test accuracy on Cora, using $k$-medoids clustering
  • Figure 4: Test accuracy on Cora, using $k$-medoids clustering and estimating classes