G-OSR: A Comprehensive Benchmark for Graph Open-Set Recognition
Yicong Dong, Rundong He, Guangyao Chen, Wentao Zhang, Zhongyi Han, Jieming Shi, Yilong Yin
TL;DR
G-OSR introduces a unified, comprehensive benchmark for Graph Open-Set Recognition that evaluates node- and graph-level open-set tasks across diverse domains and datasets. By grouping traditional OOD/OSR, GOODD, and GAD methods under standardized experimental settings, it reveals that auxiliary data and graph-aware strategies (e.g., SGOOD, AAGOD, EMP) generally outperform traditional approaches, especially on graph-level tasks, while post-hoc and anomaly-detection methods often underperform in open-set scenarios. The study also demonstrates that graph structure and dataset complexity significantly influence performance, with graph-specific methods demonstrating superior robustness as class space grows. Overall, G-OSR provides a fair, extensible platform to benchmark GOSR methods and guides future work toward graph-aware open-set techniques and foundation-model integration for AI across science domains.
Abstract
Graph Neural Networks (GNNs) have achieved significant success in machine learning, with wide applications in social networks, bioinformatics, knowledge graphs, and other fields. Most research assumes ideal closed-set environments. However, in real-world open-set environments, graph learning models face challenges in robustness and reliability due to unseen classes. This highlights the need for Graph Open-Set Recognition (GOSR) methods to address these issues and ensure effective GNN application in practical scenarios. Research in GOSR is in its early stages, with a lack of a comprehensive benchmark spanning diverse tasks and datasets to evaluate methods. Moreover, traditional methods, Graph Out-of-Distribution Detection (GOODD), GOSR, and Graph Anomaly Detection (GAD) have mostly evolved in isolation, with little exploration of their interconnections or potential applications to GOSR. To fill these gaps, we introduce \textbf{G-OSR}, a comprehensive benchmark for evaluating GOSR methods at both the node and graph levels, using datasets from multiple domains to ensure fair and standardized comparisons of effectiveness and efficiency across traditional, GOODD, GOSR, and GAD methods. The results offer critical insights into the generalizability and limitations of current GOSR methods and provide valuable resources for advancing research in this field through systematic analysis of diverse approaches.
