Table of Contents
Fetching ...

POWN: Prototypical Open-World Node Classification

Marcel Hoffmann, Lukas Galke, Ansgar Scherp

TL;DR

This work tackles true open-world semi-supervised node classification, where unlabeled nodes may belong to known or unseen classes. It introduces POWN, an end-to-end prototypical framework that learns class prototypes and uses a combination of supervised, self-supervised (via Deep Graph Infomax), and pseudo-label losses to classify nodes into known and new classes without data augmentation. Prototypes encode semantic class structure, and a label-propagation scheme with edge weights aligned to embedding distances bridges topology and embedding space for robust pseudo-labels. Across six benchmarks, POWN outperforms adapted baselines (including GCN, DGI-based methods, and OpenWGL extensions), especially on large graphs, and demonstrates robustness to hyperparameters and the ability to estimate the number of classes when unknown. The approach advances practical open-world graph learning with zero-shot new-class classification and scalable, end-to-end training.

Abstract

We consider the problem of \textit{true} open-world semi-supervised node classification, in which nodes in a graph either belong to known or new classes, with the latter not present during training. Existing methods detect and reject new classes but fail to distinguish between different new classes. We adapt existing methods and show they do not solve the problem sufficiently. We introduce a novel end-to-end approach for classification into known classes and new classes based on class prototypes, which we call Prototypical Open-World Learning for Node Classification (POWN). Our method combines graph semi-supervised learning, self-supervised learning, and pseudo-labeling to learn prototype representations of new classes in a zero-shot way. In contrast to existing solutions from the vision domain, POWN does not require data augmentation techniques for node classification. Experiments on benchmark datasets demonstrate the effectiveness of POWN, where it outperforms baselines by up to $20\%$ accuracy on the small and up to $30\%$ on the large datasets. Source code is available at https://github.com/Bobowner/POWN.

POWN: Prototypical Open-World Node Classification

TL;DR

This work tackles true open-world semi-supervised node classification, where unlabeled nodes may belong to known or unseen classes. It introduces POWN, an end-to-end prototypical framework that learns class prototypes and uses a combination of supervised, self-supervised (via Deep Graph Infomax), and pseudo-label losses to classify nodes into known and new classes without data augmentation. Prototypes encode semantic class structure, and a label-propagation scheme with edge weights aligned to embedding distances bridges topology and embedding space for robust pseudo-labels. Across six benchmarks, POWN outperforms adapted baselines (including GCN, DGI-based methods, and OpenWGL extensions), especially on large graphs, and demonstrates robustness to hyperparameters and the ability to estimate the number of classes when unknown. The approach advances practical open-world graph learning with zero-shot new-class classification and scalable, end-to-end training.

Abstract

We consider the problem of \textit{true} open-world semi-supervised node classification, in which nodes in a graph either belong to known or new classes, with the latter not present during training. Existing methods detect and reject new classes but fail to distinguish between different new classes. We adapt existing methods and show they do not solve the problem sufficiently. We introduce a novel end-to-end approach for classification into known classes and new classes based on class prototypes, which we call Prototypical Open-World Learning for Node Classification (POWN). Our method combines graph semi-supervised learning, self-supervised learning, and pseudo-labeling to learn prototype representations of new classes in a zero-shot way. In contrast to existing solutions from the vision domain, POWN does not require data augmentation techniques for node classification. Experiments on benchmark datasets demonstrate the effectiveness of POWN, where it outperforms baselines by up to accuracy on the small and up to on the large datasets. Source code is available at https://github.com/Bobowner/POWN.
Paper Structure (43 sections, 9 equations, 5 figures, 10 tables)

This paper contains 43 sections, 9 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: Overview of the losses in POWN. The input graph has four classes, where two are known (red and blue) and two are new (orange and green). The nodes without a question mark are nodes in $V_l$ and the nodes with a question mark are from $V_u$. Stars represent the prototypes in embedding space, and $w_i$ is the label propagation weight.
  • Figure 2: Test accuracy on all classes for variations of the hyperparameters $\lambda$, $\mu$, $\nu$, and $\kappa$ (figures from left to right). The orange dot marks the hyperparameters with the highest validation accuracy found by Bayesian search.
  • Figure 3: Left plot: $t$-SNE embeddings of GCN, right: POWN on the Photo dataset. Known classes are colored in gray, each new class in a different color.
  • Figure 4: The distribution of edge weights before and after training, separated by homophilic and heterophilic edges.
  • Figure 5: Ablation study on the temperature parameters of the supervised and pseudo-label loss.