Table of Contents
Fetching ...

GDDA: Semantic OOD Detection on Graphs under Covariate Shift via Score-Based Diffusion Models

Zhixia He, Chen Zhao, Minglai Shao, Yujie Lin, Dong Li, Qin Tian

TL;DR

This work proposes a novel two-phase framework called Graph Disentangled Diffusion Augmentation (GDDA), which introduces a novel distribution-shift-controlled score-based generative diffusion model that generates latent factors outside the training semantic and style spaces.

Abstract

Out-of-distribution (OOD) detection poses a significant challenge for Graph Neural Networks (GNNs), particularly in open-world scenarios with varying distribution shifts. Most existing OOD detection methods on graphs primarily focus on identifying instances in test data domains caused by either semantic shifts (changes in data classes) or covariate shifts (changes in data features), while leaving the simultaneous occurrence of both distribution shifts under-explored. In this work, we address both types of shifts simultaneously and introduce a novel challenge for OOD detection on graphs: graph-level semantic OOD detection under covariate shift. In this scenario, variations between the training and test domains result from the concurrent presence of both covariate and semantic shifts, where only graphs associated with unknown classes are identified as OOD samples (OODs). To tackle this challenge, we propose a novel two-phase framework called Graph Disentangled Diffusion Augmentation (GDDA). The first phase focuses on disentangling graph representations into domain-invariant semantic factors and domain-specific style factors. In the second phase, we introduce a novel distribution-shift-controlled score-based generative diffusion model that generates latent factors outside the training semantic and style spaces. Additionally, auxiliary pseudo-in-distribution (InD) and pseudo-OOD graph representations are employed to enhance the effectiveness of the energy-based semantic OOD detector. Extensive empirical studies on three benchmark datasets demonstrate that our approach outperforms state-of-the-art baselines.

GDDA: Semantic OOD Detection on Graphs under Covariate Shift via Score-Based Diffusion Models

TL;DR

This work proposes a novel two-phase framework called Graph Disentangled Diffusion Augmentation (GDDA), which introduces a novel distribution-shift-controlled score-based generative diffusion model that generates latent factors outside the training semantic and style spaces.

Abstract

Out-of-distribution (OOD) detection poses a significant challenge for Graph Neural Networks (GNNs), particularly in open-world scenarios with varying distribution shifts. Most existing OOD detection methods on graphs primarily focus on identifying instances in test data domains caused by either semantic shifts (changes in data classes) or covariate shifts (changes in data features), while leaving the simultaneous occurrence of both distribution shifts under-explored. In this work, we address both types of shifts simultaneously and introduce a novel challenge for OOD detection on graphs: graph-level semantic OOD detection under covariate shift. In this scenario, variations between the training and test domains result from the concurrent presence of both covariate and semantic shifts, where only graphs associated with unknown classes are identified as OOD samples (OODs). To tackle this challenge, we propose a novel two-phase framework called Graph Disentangled Diffusion Augmentation (GDDA). The first phase focuses on disentangling graph representations into domain-invariant semantic factors and domain-specific style factors. In the second phase, we introduce a novel distribution-shift-controlled score-based generative diffusion model that generates latent factors outside the training semantic and style spaces. Additionally, auxiliary pseudo-in-distribution (InD) and pseudo-OOD graph representations are employed to enhance the effectiveness of the energy-based semantic OOD detector. Extensive empirical studies on three benchmark datasets demonstrate that our approach outperforms state-of-the-art baselines.

Paper Structure

This paper contains 4 sections, 16 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Illustration of InD and semantic OOD graphs under covariate shift. In the GOOD-CMNIST dataset gui2022good, digit numbers represent the semantic classes, and digit colors represent domain variations. The training graphs span multiple domains (red and yellow), each containing known classes (2 and 6). Since the testing graphs are unknown and inaccessible during training, a semantic OOD detector distinguishes in-distribution graphs (InDs) with known classes from semantic out-of-distribution graphs (OODs) with unknown classes (8 and 3), disregarding variation differences between training and testing domains.
  • Figure 2: An overview of GDDA framework comprising two phases. (Left) In the first phase, two encoders $\bm E_c$ and $\bm E_s$ are employed to disentangle the graph representations $\mathbf h$ into semantic factors $\mathbf c$ and style factors $\mathbf s$. We sample alternative style factors $\mathbf s'$ from Gaussian distribution, and concatenate $\mathbf c$ with $\mathbf s$, as well as $\mathbf c$ with $\mathbf s'$. These concatenated factors are then fed into the decoder $\bm D$ for representation reconstruction. Additionally, the reconstructed representations ${\mathbf h}^{re}$ are reintroduced into encoders for factor reconstruction. (Right) In the second phase, the disentangled training factors $\mathbf c$ and $\mathbf s$ are incorporated into diffusion models. We apply different perturbations by setting $\lambda_c=0$ and $\lambda_c\neq0$ to derive the corresponding ${\mathbf c}^{ind}$ and ${\mathbf c}^{ood}$, which are then concatenated with the perturbed ${\mathbf s}^{ood}$. These concatenated factors are fed into the pre-trained decoder to generate the final pseudo representations.
  • Figure 3: t-SNE visualization of the original, pseudo-InD and pseudo-OOD representations.
  • Figure 4: Ablation results on the three datasets.