Table of Contents
Fetching ...

Optimizing OOD Detection in Molecular Graphs: A Novel Approach with Diffusion Models

Xu Shen, Yili Wang, Kaixiong Zhou, Shirui Pan, Xin Wang

TL;DR

The paper tackles OOD detection in molecular graphs by introducing PGR-MOOD, a diffusion-model–based approach that uses prototypical graphs and the $FGW$ distance to robustly distinguish ID from OOD samples. It overcomes key bottlenecks of naive diffusion-based reconstructions by generating a fixed set of prototypical graphs during training and scoring test graphs against this prototype set, significantly improving both detection accuracy and efficiency. Empirical results across ten datasets show state-of-the-art performance with notable gains in $\text{AUC}$, $\text{AUPR}$, and reductions in $\text{FPR95}$, while also reducing training time and memory usage compared to prior diffusion-based methods. The approach has practical impact for scalable, reliable OOD detection in drug discovery and other molecular graph applications.

Abstract

The open-world test dataset is often mixed with out-of-distribution (OOD) samples, where the deployed models will struggle to make accurate predictions. Traditional detection methods need to trade off OOD detection and in-distribution (ID) classification performance since they share the same representation learning model. In this work, we propose to detect OOD molecules by adopting an auxiliary diffusion model-based framework, which compares similarities between input molecules and reconstructed graphs. Due to the generative bias towards reconstructing ID training samples, the similarity scores of OOD molecules will be much lower to facilitate detection. Although it is conceptually simple, extending this vanilla framework to practical detection applications is still limited by two significant challenges. First, the popular similarity metrics based on Euclidian distance fail to consider the complex graph structure. Second, the generative model involving iterative denoising steps is time-consuming especially when it runs on the enormous pool of drugs. To address these challenges, our research pioneers an approach of Prototypical Graph Reconstruction for Molecular OOD Detection, dubbed as PGR-MOOD and hinges on three innovations: i) An effective metric to comprehensively quantify the matching degree of input and reconstructed molecules; ii) A creative graph generator to construct prototypical graphs that are in line with ID but away from OOD; iii) An efficient and scalable OOD detector to compare the similarity between test samples and pre-constructed prototypical graphs and omit the generative process on every new molecule. Extensive experiments on ten benchmark datasets and six baselines are conducted to demonstrate our superiority.

Optimizing OOD Detection in Molecular Graphs: A Novel Approach with Diffusion Models

TL;DR

The paper tackles OOD detection in molecular graphs by introducing PGR-MOOD, a diffusion-model–based approach that uses prototypical graphs and the distance to robustly distinguish ID from OOD samples. It overcomes key bottlenecks of naive diffusion-based reconstructions by generating a fixed set of prototypical graphs during training and scoring test graphs against this prototype set, significantly improving both detection accuracy and efficiency. Empirical results across ten datasets show state-of-the-art performance with notable gains in , , and reductions in , while also reducing training time and memory usage compared to prior diffusion-based methods. The approach has practical impact for scalable, reliable OOD detection in drug discovery and other molecular graph applications.

Abstract

The open-world test dataset is often mixed with out-of-distribution (OOD) samples, where the deployed models will struggle to make accurate predictions. Traditional detection methods need to trade off OOD detection and in-distribution (ID) classification performance since they share the same representation learning model. In this work, we propose to detect OOD molecules by adopting an auxiliary diffusion model-based framework, which compares similarities between input molecules and reconstructed graphs. Due to the generative bias towards reconstructing ID training samples, the similarity scores of OOD molecules will be much lower to facilitate detection. Although it is conceptually simple, extending this vanilla framework to practical detection applications is still limited by two significant challenges. First, the popular similarity metrics based on Euclidian distance fail to consider the complex graph structure. Second, the generative model involving iterative denoising steps is time-consuming especially when it runs on the enormous pool of drugs. To address these challenges, our research pioneers an approach of Prototypical Graph Reconstruction for Molecular OOD Detection, dubbed as PGR-MOOD and hinges on three innovations: i) An effective metric to comprehensively quantify the matching degree of input and reconstructed molecules; ii) A creative graph generator to construct prototypical graphs that are in line with ID but away from OOD; iii) An efficient and scalable OOD detector to compare the similarity between test samples and pre-constructed prototypical graphs and omit the generative process on every new molecule. Extensive experiments on ten benchmark datasets and six baselines are conducted to demonstrate our superiority.
Paper Structure (28 sections, 15 equations, 9 figures, 3 tables, 1 algorithm)

This paper contains 28 sections, 15 equations, 9 figures, 3 tables, 1 algorithm.

Figures (9)

  • Figure 1: (a) Illustration of OOD and ID molecules, which have different scaffolds or sizes, or both. (b) Vanilla GCN's performance declines rapidly when testing on OOD graphs, even though it performs well on ID graphs.
  • Figure 2: Illustration of reconstruction-based OOD detection with the diffusion model. ID and OOD share different similarities with their respective reconstruction graphs and can be used as a score for OOD detection.
  • Figure 3: Validation experiments performed in DrugOOD-IC50-Scaffold (left) and DrugOOD-EC50-Assay (right).
  • Figure 4: Experiments on DrugOOD. (a) Diffusion model requires a large number of iterations to obtain an effective reconstruction. (b) The reconstruction does not yield the discriminative results as expected.
  • Figure 5: Overview of the proposed PGR-MOOD method. In the training phase, we utilize a pre-trained diffusion model to generate OODs, then calculate $\mathcal{L}_{\mathrm{guide}}$ with OODs and training graphs. Under the guide of $\mathcal{L}_{\mathrm{guide}}$, the prototypical graphs generator generates prototypical graphs $\overline{G}$ as the reconstruction of testing inputs. In the testing phase, we utilize $\overline{G}$ to calculate the similarity between testing graphs as the OOD judge score.
  • ...and 4 more figures