Table of Contents
Fetching ...

NC-NCD: Novel Class Discovery for Node Classification

Yue Hou, Xueyuan Chen, He Zhu, Romei Liu, Bowen Shi, Jiaheng Liu, Junran Wu, Ke Xu

TL;DR

This paper defines a practical Novel Class Discovery for Node Classification (NC-NCD) setting where unlabeled new classes appear on graphs after learning old classes, without relying on old-class data later. It introduces SWORD, a self-training framework that combines pairwise-similarity pseudo-labeling, a task-agnostic joint classifier, prototype replay, and feature distillation to discover new categories while preserving old ones. Empirical results on four benchmarks across multiple GNN backbones show that SWORD achieves a balanced and superior performance on old, new, and all categories compared to state-of-the-art baselines, with ablations confirming the importance of prototype replay, distillation, and self-training. The work advances practical continual learning on graphs by enabling robust NC-NCD without task-id, highlighting the method’s potential for scalable, real-world graph analysis and future extensions to unknown novel-class counts and multi-stage settings.

Abstract

Novel Class Discovery (NCD) involves identifying new categories within unlabeled data by utilizing knowledge acquired from previously established categories. However, existing NCD methods often struggle to maintain a balance between the performance of old and new categories. Discovering unlabeled new categories in a class-incremental way is more practical but also more challenging, as it is frequently hindered by either catastrophic forgetting of old categories or an inability to learn new ones. Furthermore, the implementation of NCD on continuously scalable graph-structured data remains an under-explored area. In response to these challenges, we introduce for the first time a more practical NCD scenario for node classification (i.e., NC-NCD), and propose a novel self-training framework with prototype replay and distillation called SWORD, adopted to our NC-NCD setting. Our approach enables the model to cluster unlabeled new category nodes after learning labeled nodes while preserving performance on old categories without reliance on old category nodes. SWORD achieves this by employing a self-training strategy to learn new categories and preventing the forgetting of old categories through the joint use of feature prototypes and knowledge distillation. Extensive experiments on four common benchmarks demonstrate the superiority of SWORD over other state-of-the-art methods.

NC-NCD: Novel Class Discovery for Node Classification

TL;DR

This paper defines a practical Novel Class Discovery for Node Classification (NC-NCD) setting where unlabeled new classes appear on graphs after learning old classes, without relying on old-class data later. It introduces SWORD, a self-training framework that combines pairwise-similarity pseudo-labeling, a task-agnostic joint classifier, prototype replay, and feature distillation to discover new categories while preserving old ones. Empirical results on four benchmarks across multiple GNN backbones show that SWORD achieves a balanced and superior performance on old, new, and all categories compared to state-of-the-art baselines, with ablations confirming the importance of prototype replay, distillation, and self-training. The work advances practical continual learning on graphs by enabling robust NC-NCD without task-id, highlighting the method’s potential for scalable, real-world graph analysis and future extensions to unknown novel-class counts and multi-stage settings.

Abstract

Novel Class Discovery (NCD) involves identifying new categories within unlabeled data by utilizing knowledge acquired from previously established categories. However, existing NCD methods often struggle to maintain a balance between the performance of old and new categories. Discovering unlabeled new categories in a class-incremental way is more practical but also more challenging, as it is frequently hindered by either catastrophic forgetting of old categories or an inability to learn new ones. Furthermore, the implementation of NCD on continuously scalable graph-structured data remains an under-explored area. In response to these challenges, we introduce for the first time a more practical NCD scenario for node classification (i.e., NC-NCD), and propose a novel self-training framework with prototype replay and distillation called SWORD, adopted to our NC-NCD setting. Our approach enables the model to cluster unlabeled new category nodes after learning labeled nodes while preserving performance on old categories without reliance on old category nodes. SWORD achieves this by employing a self-training strategy to learn new categories and preventing the forgetting of old categories through the joint use of feature prototypes and knowledge distillation. Extensive experiments on four common benchmarks demonstrate the superiority of SWORD over other state-of-the-art methods.
Paper Structure (44 sections, 14 equations, 6 figures, 7 tables, 1 algorithm)

This paper contains 44 sections, 14 equations, 6 figures, 7 tables, 1 algorithm.

Figures (6)

  • Figure 1: An illustration of the different settings in novel class discovery and incremental learning tasks.
  • Figure 2: Overall architecture of SWORD. (a) Pre-training phase, a GNN encoder that extracts node representations is trained on labeled old classes and feature prototypes are recorded for the NCD task; (b) NCD-training phase, SWORD learns unlabeled new category nodes and prevents forgetting through self-training with prototype replay and distillation.
  • Figure 3: Sensitivity analysis w.r.t. parameter $\alpha_1$, $\alpha_2$, $\eta$ and $\lambda$.
  • Figure 4: Comparisons of confusion matrix of different methods.
  • Figure 5: T-SNE embedding visualization.
  • ...and 1 more figures