Graph Out-of-Distribution Detection via Test-Time Calibration with Dual Dynamic Dictionaries
Yue Hou, Ruomei Liu, Yingke Su, Junran Wu, Ke Xu
TL;DR
BaCa tackles graph-level OOD detection without ground-truth OOD data by performing test-time calibration through boundary modeling with graphons and discriminative typologies. It partitions test samples by initial predictions, estimates graphons per subgroup, and uses graphon mixup to synthesize boundary samples stored in dual dynamic dictionaries, which feed an attention-based calibration to produce a final score $S_{\text{BaCa}}=S_{\text{Pre}}+\beta\,S_{\text{Attn}}$. An adaptive attention mechanism over a small top-$\mathbb{K}$ dictionary entries combines ID and OOD cues to sharpen the ID/OOD boundary, while a dual BCE objective trains the attention heads without updating the pretrained encoder. Empirically, BaCa outperforms state-of-the-art baselines on 10 dataset pairs, with notable gains over GOODAT and robust performance across varied graph domains, demonstrating practical utility for reliable graph OOD detection in real-world open-world scenarios.
Abstract
A key challenge in graph out-of-distribution (OOD) detection lies in the absence of ground-truth OOD samples during training. Existing methods are typically optimized to capture features within the in-distribution (ID) data and calculate OOD scores, which often limits pre-trained models from representing distributional boundaries, leading to unreliable OOD detection. Moreover, the latent structure of graph data is often governed by multiple underlying factors, which remains less explored. To address these challenges, we propose a novel test-time graph OOD detection method, termed BaCa, that calibrates OOD scores using dual dynamically updated dictionaries without requiring fine-tuning the pre-trained model. Specifically, BaCa estimates graphons and applies a mix-up strategy solely with test samples to generate diverse boundary-aware discriminative topologies, eliminating the need for exposing auxiliary datasets as outliers. We construct dual dynamic dictionaries via priority queues and attention mechanisms to adaptively capture latent ID and OOD representations, which are then utilized for boundary-aware OOD score calibration. To the best of our knowledge, extensive experiments on real-world datasets show that BaCa significantly outperforms existing state-of-the-art methods in OOD detection.
