Table of Contents
Fetching ...

Graph Out-of-Distribution Detection via Test-Time Calibration with Dual Dynamic Dictionaries

Yue Hou, Ruomei Liu, Yingke Su, Junran Wu, Ke Xu

TL;DR

BaCa tackles graph-level OOD detection without ground-truth OOD data by performing test-time calibration through boundary modeling with graphons and discriminative typologies. It partitions test samples by initial predictions, estimates graphons per subgroup, and uses graphon mixup to synthesize boundary samples stored in dual dynamic dictionaries, which feed an attention-based calibration to produce a final score $S_{\text{BaCa}}=S_{\text{Pre}}+\beta\,S_{\text{Attn}}$. An adaptive attention mechanism over a small top-$\mathbb{K}$ dictionary entries combines ID and OOD cues to sharpen the ID/OOD boundary, while a dual BCE objective trains the attention heads without updating the pretrained encoder. Empirically, BaCa outperforms state-of-the-art baselines on 10 dataset pairs, with notable gains over GOODAT and robust performance across varied graph domains, demonstrating practical utility for reliable graph OOD detection in real-world open-world scenarios.

Abstract

A key challenge in graph out-of-distribution (OOD) detection lies in the absence of ground-truth OOD samples during training. Existing methods are typically optimized to capture features within the in-distribution (ID) data and calculate OOD scores, which often limits pre-trained models from representing distributional boundaries, leading to unreliable OOD detection. Moreover, the latent structure of graph data is often governed by multiple underlying factors, which remains less explored. To address these challenges, we propose a novel test-time graph OOD detection method, termed BaCa, that calibrates OOD scores using dual dynamically updated dictionaries without requiring fine-tuning the pre-trained model. Specifically, BaCa estimates graphons and applies a mix-up strategy solely with test samples to generate diverse boundary-aware discriminative topologies, eliminating the need for exposing auxiliary datasets as outliers. We construct dual dynamic dictionaries via priority queues and attention mechanisms to adaptively capture latent ID and OOD representations, which are then utilized for boundary-aware OOD score calibration. To the best of our knowledge, extensive experiments on real-world datasets show that BaCa significantly outperforms existing state-of-the-art methods in OOD detection.

Graph Out-of-Distribution Detection via Test-Time Calibration with Dual Dynamic Dictionaries

TL;DR

BaCa tackles graph-level OOD detection without ground-truth OOD data by performing test-time calibration through boundary modeling with graphons and discriminative typologies. It partitions test samples by initial predictions, estimates graphons per subgroup, and uses graphon mixup to synthesize boundary samples stored in dual dynamic dictionaries, which feed an attention-based calibration to produce a final score . An adaptive attention mechanism over a small top- dictionary entries combines ID and OOD cues to sharpen the ID/OOD boundary, while a dual BCE objective trains the attention heads without updating the pretrained encoder. Empirically, BaCa outperforms state-of-the-art baselines on 10 dataset pairs, with notable gains over GOODAT and robust performance across varied graph domains, demonstrating practical utility for reliable graph OOD detection in real-world open-world scenarios.

Abstract

A key challenge in graph out-of-distribution (OOD) detection lies in the absence of ground-truth OOD samples during training. Existing methods are typically optimized to capture features within the in-distribution (ID) data and calculate OOD scores, which often limits pre-trained models from representing distributional boundaries, leading to unreliable OOD detection. Moreover, the latent structure of graph data is often governed by multiple underlying factors, which remains less explored. To address these challenges, we propose a novel test-time graph OOD detection method, termed BaCa, that calibrates OOD scores using dual dynamically updated dictionaries without requiring fine-tuning the pre-trained model. Specifically, BaCa estimates graphons and applies a mix-up strategy solely with test samples to generate diverse boundary-aware discriminative topologies, eliminating the need for exposing auxiliary datasets as outliers. We construct dual dynamic dictionaries via priority queues and attention mechanisms to adaptively capture latent ID and OOD representations, which are then utilized for boundary-aware OOD score calibration. To the best of our knowledge, extensive experiments on real-world datasets show that BaCa significantly outperforms existing state-of-the-art methods in OOD detection.

Paper Structure

This paper contains 28 sections, 1 theorem, 18 equations, 15 figures, 5 tables, 1 algorithm.

Key Result

Lemma 1

Let $F$ be a simple graph and $W, W'$ be graphons. Then where $\mathrm{e}(F)$ is the number of edges in $F$.

Figures (15)

  • Figure 1: An example of OOD score distribution and detection performance evolution over test-time iterations on the PTC/MUTAG dataset pair. (a) Before calibration, we dynamically feed the lower left tail of the OOD score distribution into the OOD dictionary and the higher right tail of the ID score distribution into the ID dictionary via two priority queues. (b) After calibration, the overlap between the ID and OOD score distributions is significantly reduced. (c) KL divergence and the loss of attention-based trainable parameters during the first 200 iterations. (d) AUC of test-time OOD detection performance over the first 200 iterations, where Total, Attn, and Base denote our full method with $S_\text{BaCa}$, attention-based calibration with $S_\text{Attn}$, and the pre-trained baseline with $S_{\text{Pre}}$, respectively.
  • Figure 2: Overview of our proposed BaCa framework. (a.1) Given a pre-trained GNN encoder and test samples, we first compute the initial OOD scores and partition the samples into two preliminary subgroups based on the pre-trained model's predictions. (a.2–a.3) Within each subgroup, diverse discriminative typologies are generated via graphon mixup and stored in dual dynamic dictionaries maintained as priority queues. (b.1–b.2) The priority queue–based dictionaries are used to support adaptive, attention-based score calibration. (b.3) The adaptive attention module is optimized during inference to compute the final calibrated OOD score.
  • Figure 3: The sensitivity of $\beta$ on calibration.
  • Figure 4: The sensitivity of $\mathbb{K}$ on calibration.
  • Figure 5: Estimated graphons and their mixup results on the PTC/MUTAG (PTC as ID, MUTAG as OOD). Within each row, the first two columns are the original estimated graphons, and the third column is mixed graphon.
  • ...and 10 more figures

Theorems & Definitions (6)

  • Definition 1: Discriminative Typology
  • Definition 1: Graphon
  • Definition 2: Cut Norm
  • Definition 3: Homomorphism density
  • Lemma 1: Counting Lemma lovasz2012large
  • proof