Table of Contents
Fetching ...

GOLD: Graph Out-of-Distribution Detection via Implicit Adversarial Latent Generation

Danny Wang, Ruihong Qiu, Guangdong Bai, Zi Huang

TL;DR

The paper tackles graph OOD detection when real OOD data is unavailable by proposing GOLD, an implicit adversarial framework that synthesizes pseudo-OOD embeddings from in-distribution data. A latent generative model imitates ID embeddings to produce pseudo-OOD representations, while an energy-based detector and a GNN encoder are trained adversarially to maximize the energy gap between ID and pseudo-OOD, thereby simulating OOD exposure without external data. GOLD leverages either a latent diffusion model or a VAE for latent generation and introduces an energy-divergence objective combining an uncertainty loss and a divergence regulariser, achieving state-of-the-art performance among non-OOD-exposed methods and competitive results with real OOD-exposed baselines. The approach demonstrates strong OOD detection performance across five datasets, maintains efficient inference, and highlights the importance of adversarial training and a dedicated energy-detector head for robust separation of ID and OOD signals in graph-structured data.

Abstract

Despite graph neural networks' (GNNs) great success in modelling graph-structured data, out-of-distribution (OOD) test instances still pose a great challenge for current GNNs. One of the most effective techniques to detect OOD nodes is to expose the detector model with an additional OOD node-set, yet the extra OOD instances are often difficult to obtain in practice. Recent methods for image data address this problem using OOD data synthesis, typically relying on pre-trained generative models like Stable Diffusion. However, these approaches require vast amounts of additional data, as well as one-for-all pre-trained generative models, which are not available for graph data. Therefore, we propose the GOLD framework for graph OOD detection, an implicit adversarial learning pipeline with synthetic OOD exposure without pre-trained models. The implicit adversarial training process employs a novel alternating optimisation framework by training: (1) a latent generative model to regularly imitate the in-distribution (ID) embeddings from an evolving GNN, and (2) a GNN encoder and an OOD detector to accurately classify ID data while increasing the energy divergence between the ID embeddings and the generative model's synthetic embeddings. This novel approach implicitly transforms the synthetic embeddings into pseudo-OOD instances relative to the ID data, effectively simulating exposure to OOD scenarios without auxiliary data. Extensive OOD detection experiments are conducted on five benchmark graph datasets, verifying the superior performance of GOLD without using real OOD data compared with the state-of-the-art OOD exposure and non-exposure baselines.

GOLD: Graph Out-of-Distribution Detection via Implicit Adversarial Latent Generation

TL;DR

The paper tackles graph OOD detection when real OOD data is unavailable by proposing GOLD, an implicit adversarial framework that synthesizes pseudo-OOD embeddings from in-distribution data. A latent generative model imitates ID embeddings to produce pseudo-OOD representations, while an energy-based detector and a GNN encoder are trained adversarially to maximize the energy gap between ID and pseudo-OOD, thereby simulating OOD exposure without external data. GOLD leverages either a latent diffusion model or a VAE for latent generation and introduces an energy-divergence objective combining an uncertainty loss and a divergence regulariser, achieving state-of-the-art performance among non-OOD-exposed methods and competitive results with real OOD-exposed baselines. The approach demonstrates strong OOD detection performance across five datasets, maintains efficient inference, and highlights the importance of adversarial training and a dedicated energy-detector head for robust separation of ID and OOD signals in graph-structured data.

Abstract

Despite graph neural networks' (GNNs) great success in modelling graph-structured data, out-of-distribution (OOD) test instances still pose a great challenge for current GNNs. One of the most effective techniques to detect OOD nodes is to expose the detector model with an additional OOD node-set, yet the extra OOD instances are often difficult to obtain in practice. Recent methods for image data address this problem using OOD data synthesis, typically relying on pre-trained generative models like Stable Diffusion. However, these approaches require vast amounts of additional data, as well as one-for-all pre-trained generative models, which are not available for graph data. Therefore, we propose the GOLD framework for graph OOD detection, an implicit adversarial learning pipeline with synthetic OOD exposure without pre-trained models. The implicit adversarial training process employs a novel alternating optimisation framework by training: (1) a latent generative model to regularly imitate the in-distribution (ID) embeddings from an evolving GNN, and (2) a GNN encoder and an OOD detector to accurately classify ID data while increasing the energy divergence between the ID embeddings and the generative model's synthetic embeddings. This novel approach implicitly transforms the synthetic embeddings into pseudo-OOD instances relative to the ID data, effectively simulating exposure to OOD scenarios without auxiliary data. Extensive OOD detection experiments are conducted on five benchmark graph datasets, verifying the superior performance of GOLD without using real OOD data compared with the state-of-the-art OOD exposure and non-exposure baselines.

Paper Structure

This paper contains 46 sections, 24 equations, 8 figures, 23 tables, 1 algorithm.

Figures (8)

  • Figure 1: Motivation of GOLD: The initially close energy distributions (a) after training the latent generative model, become separated after training GOLD (b), where the initial pseudo-OOD (p-OOD) embeddings (embeds.,) (c) implicitly diverges from the ID data and resembles real OOD instances (d).
  • Figure 2: Overview of GOLD. Given an input graph, GOLD consists of two components: Steptrains a latent generative model using hidden representation $\mathbf{H}$ from a frozen GNN. Steptrains a GNN classifier and an OOD detector based on the ID data $\mathbf{H}$ and the latent generator generated pseudo data $\mathbf{H}_\text{p-OOD}$. The overall training is in an adversarial manner.
  • Figure 3: Transformed energy $\mathbf{e}'$ distribution during adversarial training (Tr.) on the Twitch dataset for in-distribution (ID), pseudo (p-)OOD, and real OOD across epochs. (a) shows that after LGM trains to mimic ID data, energy scores are overlapped for ID, p-OOD, and OOD in the initial stages. (b) indicates that after training GNN and the detector to separate the energy of ID and the p-OOD, the real OOD energy cannot be effectively separated from ID. This is a similar situation to the OOD exposure for GNNSafe as in Figure \ref{['fig:twitch_gnnsafe_id_ood']}. (c) shows that under adversarial learning, the LGM will be updated to generate p-OOD closer to the updated ID data, preventing it from being too far away from ID data with ineffective OOD learning. (d) displays the final energy distribution after convergence, with real OOD and ID being well separated, while p-OOD and OOD being well aligned.
  • Figure 4: Energy score distributions for Twitch and Cora-L with GOLD and GNNSafe++. The vertical green (red) dashed lines represent the thresholds $t_\text{ID}$ ($t_\text{OOD}$) from Eq. \ref{['eq: ereg']}. $\mathbf{e}$ denotes original energy scores from the GNN, while $\mathbf{e}'$ are transformed scores from the detector, with subscripts for ID, OOD, p-OOD (pseudo), and e-OOD (exposed) data. (a) & (d) show that transformed energy $\mathbf{e}'$ (green and red) can be further diverged from the original energy $\mathbf{e}$ (blue and orange). (b) & (e) indicate that GOLD can align the transformed energy $\mathbf{e}'$ for pseudo OOD (red) and real OOD (purple) in testing. At the same time, the transformed energy $\mathbf{e}'$ of ID (green) can be separated. (c) & (f) demonstrate that energy separation of test ID (blue) and OOD (pink) in GNNSafe++ is not effective, such that although the exposed OOD (orange) can diverge far away from the ID (blue), the real OOD (pink) is still closer to the ID (blue).
  • Figure 5: Logits vs. Softmax joint distribution plot for Twitch dataset.
  • ...and 3 more figures

Theorems & Definitions (1)

  • proof