SGOOD: Substructure-enhanced Graph-Level Out-of-Distribution Detection
Zhihao Ding, Jieming Shi, Shiqi Shen, Xuequn Shang, Jiannong Cao, Zhipeng Wang, Zhi Gong
TL;DR
SGOOD tackles graph-level out-of-distribution detection by harnessing task-agnostic substructures. It builds a per-graph super graph of substructures and applies a two-level GIN-based encoding to fuse substructure information into graph representations, complemented by substructure-preserving augmentations and a two-stage training objective that blends contrastive and supervised learning. The method is theoretically more expressive than 1&2-WL and empirically outperforms 11 baselines across 8 real-world datasets on OOD metrics, while maintaining strong ID performance. This approach offers a principled, scalable way to detect OOD graphs by preserving meaningful substructure semantics, with practical implications for safety-critical domains.
Abstract
Graph-level representation learning is important in a wide range of applications. Existing graph-level models are generally built on i.i.d. assumption for both training and testing graphs. However, in an open world, models can encounter out-of-distribution (OOD) testing graphs that are from different distributions unknown during training. A trustworthy model should be able to detect OOD graphs to avoid unreliable predictions, while producing accurate in-distribution (ID) predictions. To achieve this, we present SGOOD, a novel graph-level OOD detection framework. We find that substructure differences commonly exist between ID and OOD graphs, and design SGOOD with a series of techniques to encode task-agnostic substructures for effective OOD detection. Specifically, we build a super graph of substructures for every graph, and develop a two-level graph encoding pipeline that works on both original graphs and super graphs to obtain substructure-enhanced graph representations. We then devise substructure-preserving graph augmentation techniques to further capture more substructure semantics of ID graphs. Extensive experiments against 11 competitors on numerous graph datasets demonstrate the superiority of SGOOD, often surpassing existing methods by a significant margin. The code is available at https://github.com/TommyDzh/SGOOD.
