Novel Node Category Detection Under Subpopulation Shift
Hsing-Huan Chung, Shravan Chaudhari, Yoav Wald, Xing Han, Joydeep Ghosh
TL;DR
This work tackles novel node category detection under subpopulation shift in attributed graphs by formulating it as a PU-learning problem without ground-truth novel labels. It introduces RECO-SLIP, a framework that combines recall-constrained optimization with a sample-efficient, graph-aware selective link prediction objective to preserve the latent subgroup structure induced by edges. Empirical results on five benchmark datasets show that RECO-SLIP consistently outperforms standard PU methods, propensity-weighting approaches, and graph PU baselines, demonstrating robustness to distribution shifts. The approach offers a practical and scalable solution for safety-critical graph applications, with code available for reproducibility and further development.
Abstract
In real-world graph data, distribution shifts can manifest in various ways, such as the emergence of new categories and changes in the relative proportions of existing categories. It is often important to detect nodes of novel categories under such distribution shifts for safety or insight discovery purposes. We introduce a new approach, Recall-Constrained Optimization with Selective Link Prediction (RECO-SLIP), to detect nodes belonging to novel categories in attributed graphs under subpopulation shifts. By integrating a recall-constrained learning framework with a sample-efficient link prediction mechanism, RECO-SLIP addresses the dual challenges of resilience against subpopulation shifts and the effective exploitation of graph structure. Our extensive empirical evaluation across multiple graph datasets demonstrates the superior performance of RECO-SLIP over existing methods. The experimental code is available at https://github.com/hsinghuan/novel-node-category-detection.
