Table of Contents
Fetching ...

Out-of-Distribution Detection in Molecular Complexes via Diffusion Models for Irregular Graphs

David Graber, Victor Armegioiu, Rebecca Buller, Siddhartha Mishra

TL;DR

We address out-of-distribution detection for irregular 3D graphs by training a unified diffusion process over both coordinates and discrete identities in a continuous state. The resulting PF-ODE yields per-sample log-likelihoods and trajectory statistics that together provide a powerful, label-free typicality signal for OOD, validated on protein–ligand complexes with strict family-based OOD splits. Trajectory features further improve detection, rescuing difficult cases and correlating with downstream GEMS errors, offering practical risk assessment for downstream predictions. The approach is generative and end-to-end, enabling reliable OOD quantification for geometric deep learning beyond molecular systems and suggesting a general blueprint for trajectory-aware OOD analysis. Broadly, this work delivers a principled, label-free, trajectory-informed OOD framework for irregular graphs with immediate applicability to structure-based drug discovery and related domains.

Abstract

Predictive machine learning models generally excel on in-distribution data, but their performance degrades on out-of-distribution (OOD) inputs. Reliable deployment therefore requires robust OOD detection, yet this is particularly challenging for irregular 3D graphs that combine continuous geometry with categorical identities and are unordered by construction. Here, we present a probabilistic OOD detection framework for complex 3D graph data built on a diffusion model that learns a density of the training distribution in a fully unsupervised manner. A key ingredient we introduce is a unified continuous diffusion over both 3D coordinates and discrete features: categorical identities are embedded in a continuous space and trained with cross-entropy, while the corresponding diffusion score is obtained analytically via posterior-mean interpolation from predicted class probabilities. This yields a single self-consistent probability-flow ODE (PF-ODE) that produces per-sample log-likelihoods, providing a principled typicality score for distribution shift. We validate the approach on protein-ligand complexes and construct strict OOD datasets by withholding entire protein families from training. PF-ODE likelihoods identify held-out families as OOD and correlate strongly with prediction errors of an independent binding-affinity model (GEMS), enabling a priori reliability estimates on new complexes. Beyond scalar likelihoods, we show that multi-scale PF-ODE trajectory statistics - including path tortuosity, flow stiffness, and vector-field instability - provide complementary OOD information. Modeling the joint distribution of these trajectory features yields a practical, high-sensitivity detector that improves separation over likelihood-only baselines, offering a label-free OOD quantification workflow for geometric deep learning.

Out-of-Distribution Detection in Molecular Complexes via Diffusion Models for Irregular Graphs

TL;DR

We address out-of-distribution detection for irregular 3D graphs by training a unified diffusion process over both coordinates and discrete identities in a continuous state. The resulting PF-ODE yields per-sample log-likelihoods and trajectory statistics that together provide a powerful, label-free typicality signal for OOD, validated on protein–ligand complexes with strict family-based OOD splits. Trajectory features further improve detection, rescuing difficult cases and correlating with downstream GEMS errors, offering practical risk assessment for downstream predictions. The approach is generative and end-to-end, enabling reliable OOD quantification for geometric deep learning beyond molecular systems and suggesting a general blueprint for trajectory-aware OOD analysis. Broadly, this work delivers a principled, label-free, trajectory-informed OOD framework for irregular graphs with immediate applicability to structure-based drug discovery and related domains.

Abstract

Predictive machine learning models generally excel on in-distribution data, but their performance degrades on out-of-distribution (OOD) inputs. Reliable deployment therefore requires robust OOD detection, yet this is particularly challenging for irregular 3D graphs that combine continuous geometry with categorical identities and are unordered by construction. Here, we present a probabilistic OOD detection framework for complex 3D graph data built on a diffusion model that learns a density of the training distribution in a fully unsupervised manner. A key ingredient we introduce is a unified continuous diffusion over both 3D coordinates and discrete features: categorical identities are embedded in a continuous space and trained with cross-entropy, while the corresponding diffusion score is obtained analytically via posterior-mean interpolation from predicted class probabilities. This yields a single self-consistent probability-flow ODE (PF-ODE) that produces per-sample log-likelihoods, providing a principled typicality score for distribution shift. We validate the approach on protein-ligand complexes and construct strict OOD datasets by withholding entire protein families from training. PF-ODE likelihoods identify held-out families as OOD and correlate strongly with prediction errors of an independent binding-affinity model (GEMS), enabling a priori reliability estimates on new complexes. Beyond scalar likelihoods, we show that multi-scale PF-ODE trajectory statistics - including path tortuosity, flow stiffness, and vector-field instability - provide complementary OOD information. Modeling the joint distribution of these trajectory features yields a practical, high-sensitivity detector that improves separation over likelihood-only baselines, offering a label-free OOD quantification workflow for geometric deep learning.

Paper Structure

This paper contains 85 sections, 6 theorems, 78 equations, 10 figures, 7 tables, 2 algorithms.

Key Result

Proposition 2.1

Let $x\sim p_{\mathrm{ID}}$ and define $L(x)=-\log q_\phi(x)$ with typical value $L_{\mathrm{typ}}=\mathbb{E}_{\mathrm{ID}}[L(x)]$. Assume $L(x)$ has finite ID variance $\mathrm{Var}_{\mathrm{ID}}(L(x))\le \sigma^2$, and that there exists a non-decreasing calibration curve $\phi$ such that $e_\theta equivalently $\;\mathbb{P}_{\mathrm{ID}}\!\left(e_\theta(x)\le \phi(L_{\mathrm{typ}}+\alpha)\right)

Figures (10)

  • Figure 1: Out-of-distribution datasets yield lower log-likelihoods: Distributions of log-likelihoods assigned to protein-ligand complexes belonging to the a) training, b) validation, c) CASF2016 and the out-of-distribution (OOD) datasets d) 1nvq, e) 1sqa, f) 2p15, g) 2vw5, h) 3f3e and i) 3o9i. Lower values signify increased deviation from the learned distribution (OOD) and higher values indicate high in-distribution (ID) probability. Means, medians, standard deviations (Std) and number of samples (N) are depicted on the right side of the histograms. Vertical dashed lines indicate the median of the respective log-likelihood distributions. Distributions were subjected to individual outlier removal using the IQR method (1.5×IQR rule) followed by min-max normalization to a [-1, 0] range using global minimum and maximum values across all distributions. The training distribution was randomly subsampled, showing only a fraction of all 10'510 complexes.
  • Figure 2: Log-likelihood distributions align with protein and ligand similarity metrics: Comparison of the distributions of a) log-likelihoods assigned by the diffusion model with the distributions of train-test similarity scores (b, c and d). b) Ligand Similarity: Calculated using Tanimoto scores between count-based molecular fingerprints. High scores indicate high similarity (1.0 = identical). c) Protein Similarity: Determined by TM-align based on optimal 3D protein structure alignment. High scores indicate high structural similarity (1.0 = identical) d) Aggregated Similarity: A composite score $S=max(Tanimoto+TMScore+(1-RMSD), 0)$ was calculated as the sum of Tanimoto similarity, TM-scores, and inverted pocket-aligned ligand root mean squared deviation (RMSD). High scores signify highly similar protein-ligand complexes (3.0 = identical complexes). Each box (b, c and d) represents the distribution of $N=10,510$ similarity scores. Specifically, these are the scores between each training dataset complex and its most similar counterpart in the respective test dataset, as measured by that specific metric. Boxplots show the median (centre line), 25th–75th percentiles (box), whiskers extend to data points within 1.5 × IQR, outliers are not shown.
  • Figure 3: Correlation between log-likelihoods and GEMS performance: Comparison of log-likelihood distributions obtained for training, validation, CASF2016 and out-of-distribution (OOD) datasets with performance metrics achieved by the GEMS binding affinity prediction model trained on the same data. The box plot shows the distribution of log-likelihoods assigned to each dataset's protein-ligand complexes by the diffusion model, where lower values signify increased deviation from the learned distribution (OOD) and higher values indicate high in-distribution probability (IID). The bar plot shows the corresponding performance metrics that GEMS achieved on the same datasets, including R-squared (R2), Kendall (Tau) and Spearman (Rho) rank correlation coefficients. Boxplots show the median (centre line), 25th–75th percentiles (box), whiskers extend to data points within 1.5 × IQR, outliers are not shown.
  • Figure 4: Out-of-distribution complexes yield higher GEMS errors: Heatmaps illustrating the relationship between the log-likelihoods assigned by the diffusion model and the corresponding GEMS binding affinity prediction errors (y-axis) across the training, validation, CASF2016 and out-of-distribution (OOD) datasets. The x-axis represents the log-likelihood for each complex, where lower values signify increased deviation from the learned distribution (OOD) and higher values indicate high in-distribution probability (ID). The y-axis represents the variance-normalized errors of the GEMS binding affinity prediction. Color intensity represents density, with white areas containing no data. To allow fair comparison across datasets with different label spreads, absolute errors are normalized by the label variance, preventing artificially low error values on narrowly distributed datasets. The heatmaps visually confirm that low log-likelihoods are associated with higher GEMS prediction errors.
  • Figure S1: Error-log-likelihood relation follows a scaled a shifted exponential: Scatterplot illustrating the relationship between the log-likelihoods assigned by the diffusion model (x-axis) and the corresponding GEMS binding affinity prediction errors (y-axis) across the validation, CASF2016, and the out-of-distribution (OOD) datasets. A hierarchical fitting of five scaled and shifted exponential curves shows that log-likelihoods provide a rough estimate of the expected GEMS prediction error. Across all non-training complexes ($N=6223$), 75.8% had predicted errors that fell within the bounds of the fitted exponential curves, with 8.84% high outliers (errors above the fit) and 15.39% low outliers (errors below the fit). To allow fair comparison across datasets with different label spreads, absolute errors are normalized by the label variance, preventing artificially low error values on narrowly distributed datasets. Exponential curves are fit to a maximum of 500 randomly sampled points from each non-training distribution. Only 100 randomly sampled complexes from each datasets are shown in the to improve visibility.
  • ...and 5 more figures

Theorems & Definitions (10)

  • Proposition 2.1: Likelihood controls error with high probability
  • Lemma S1.2: Time derivative of KL along the PF-ODE
  • proof
  • Lemma S1.3: Cauchy--Schwarz bound on $\dot D_t$
  • proof
  • Proposition S1.5: Score error controls final-time KL up to capacity
  • Lemma S1.7: NLL concentration under $p_0$
  • proof
  • Theorem S1.9: High-probability error bound from PF-ODE NLL
  • proof