POWN: Prototypical Open-World Node Classification
Marcel Hoffmann, Lukas Galke, Ansgar Scherp
TL;DR
This work tackles true open-world semi-supervised node classification, where unlabeled nodes may belong to known or unseen classes. It introduces POWN, an end-to-end prototypical framework that learns class prototypes and uses a combination of supervised, self-supervised (via Deep Graph Infomax), and pseudo-label losses to classify nodes into known and new classes without data augmentation. Prototypes encode semantic class structure, and a label-propagation scheme with edge weights aligned to embedding distances bridges topology and embedding space for robust pseudo-labels. Across six benchmarks, POWN outperforms adapted baselines (including GCN, DGI-based methods, and OpenWGL extensions), especially on large graphs, and demonstrates robustness to hyperparameters and the ability to estimate the number of classes when unknown. The approach advances practical open-world graph learning with zero-shot new-class classification and scalable, end-to-end training.
Abstract
We consider the problem of \textit{true} open-world semi-supervised node classification, in which nodes in a graph either belong to known or new classes, with the latter not present during training. Existing methods detect and reject new classes but fail to distinguish between different new classes. We adapt existing methods and show they do not solve the problem sufficiently. We introduce a novel end-to-end approach for classification into known classes and new classes based on class prototypes, which we call Prototypical Open-World Learning for Node Classification (POWN). Our method combines graph semi-supervised learning, self-supervised learning, and pseudo-labeling to learn prototype representations of new classes in a zero-shot way. In contrast to existing solutions from the vision domain, POWN does not require data augmentation techniques for node classification. Experiments on benchmark datasets demonstrate the effectiveness of POWN, where it outperforms baselines by up to $20\%$ accuracy on the small and up to $30\%$ on the large datasets. Source code is available at https://github.com/Bobowner/POWN.
