Table of Contents
Fetching ...

A Geometric Explanation of the Likelihood OOD Detection Paradox

Hamidreza Kamkari, Brendan Leigh Ross, Jesse C. Cresswell, Anthony L. Caterini, Rahul G. Krishnan, Gabriel Loaiza-Ganem

TL;DR

This work identifies a geometric explanation for why likelihood-based DGMs assign high density to OOD data yet fail to generate such samples, attributing it to regions of high density that have low probability mass when the OOD data lie on low-dimensional manifolds. It introduces Local Intrinsic Dimension (LID) as a diagnostic of these regions and pairs LID estimates with log-likelihoods from pre-trained normalizing flows or diffusion models to form a dual-threshold OOD detector. The proposed method achieves or matches state-of-the-art OOD detection performance on the same backbones across a range of datasets, while providing a principled, unsupervised approach that generalizes beyond a single model family. This work advances both the theoretical understanding of likelihood pathologies and a practical, backbone-agnostic tool for robust OOD detection in real-world systems. It also highlights avenues for improving LID estimation in diffusion models and extending the approach to broader classes of DGMs in future research.

Abstract

Likelihood-based deep generative models (DGMs) commonly exhibit a puzzling behaviour: when trained on a relatively complex dataset, they assign higher likelihood values to out-of-distribution (OOD) data from simpler sources. Adding to the mystery, OOD samples are never generated by these DGMs despite having higher likelihoods. This two-pronged paradox has yet to be conclusively explained, making likelihood-based OOD detection unreliable. Our primary observation is that high-likelihood regions will not be generated if they contain minimal probability mass. We demonstrate how this seeming contradiction of large densities yet low probability mass can occur around data confined to low-dimensional manifolds. We also show that this scenario can be identified through local intrinsic dimension (LID) estimation, and propose a method for OOD detection which pairs the likelihoods and LID estimates obtained from a pre-trained DGM. Our method can be applied to normalizing flows and score-based diffusion models, and obtains results which match or surpass state-of-the-art OOD detection benchmarks using the same DGM backbones. Our code is available at https://github.com/layer6ai-labs/dgm_ood_detection.

A Geometric Explanation of the Likelihood OOD Detection Paradox

TL;DR

This work identifies a geometric explanation for why likelihood-based DGMs assign high density to OOD data yet fail to generate such samples, attributing it to regions of high density that have low probability mass when the OOD data lie on low-dimensional manifolds. It introduces Local Intrinsic Dimension (LID) as a diagnostic of these regions and pairs LID estimates with log-likelihoods from pre-trained normalizing flows or diffusion models to form a dual-threshold OOD detector. The proposed method achieves or matches state-of-the-art OOD detection performance on the same backbones across a range of datasets, while providing a principled, unsupervised approach that generalizes beyond a single model family. This work advances both the theoretical understanding of likelihood pathologies and a practical, backbone-agnostic tool for robust OOD detection in real-world systems. It also highlights avenues for improving LID estimation in diffusion models and extending the approach to broader classes of DGMs in future research.

Abstract

Likelihood-based deep generative models (DGMs) commonly exhibit a puzzling behaviour: when trained on a relatively complex dataset, they assign higher likelihood values to out-of-distribution (OOD) data from simpler sources. Adding to the mystery, OOD samples are never generated by these DGMs despite having higher likelihoods. This two-pronged paradox has yet to be conclusively explained, making likelihood-based OOD detection unreliable. Our primary observation is that high-likelihood regions will not be generated if they contain minimal probability mass. We demonstrate how this seeming contradiction of large densities yet low probability mass can occur around data confined to low-dimensional manifolds. We also show that this scenario can be identified through local intrinsic dimension (LID) estimation, and propose a method for OOD detection which pairs the likelihoods and LID estimates obtained from a pre-trained DGM. Our method can be applied to normalizing flows and score-based diffusion models, and obtains results which match or surpass state-of-the-art OOD detection benchmarks using the same DGM backbones. Our code is available at https://github.com/layer6ai-labs/dgm_ood_detection.
Paper Structure (38 sections, 17 equations, 14 figures, 9 tables, 1 algorithm)

This paper contains 38 sections, 17 equations, 14 figures, 9 tables, 1 algorithm.

Figures (14)

  • Figure 1: (a) A 1D density which is highly peaked in the OOD region (red) assigns high likelihood, but low probability mass to OOD data. (b) An analogous sketch for a 2D density concentrated around a 1D OOD manifold (red line), illustrated with FMNIST as in-distribution and MNIST as OOD. The model density has become sharply peaked around the manifold of "simpler" data which has low intrinsic dimension, which is nonetheless assigned lower probability mass as it has negligible volume.
  • Figure 2: (a) A FMNIST-trained DM assigns higher likelihoods to MNIST. (b) A NF trained on FMNIST shows notably lower likelihoods on its own generated samples than on OOD data. (c-e) Analogous pathologies on RGB datasets, both for DMs and NFs.
  • Figure 3: LID estimates and likelihood scatterplots, along with corresponding marginals. (a) FMNIST-trained model, evaluated on FMNIST, MNIST, and generated samples. (b) MNIST-trained model, evaluated on FMNIST, MNIST, and generated samples.
  • Figure 4: ROC visualizations for select pathological OOD tasks on NFs. The red dots correspond to the FPR-TPR pairs of our method obtained from different dual thresholds, the yellow areas correspond to the region under the associated Pareto frontier (i.e. the upper boundary of the red dots), while the blue areas represent the region below the ROC curve for single threshold likelihood-based classifiers. (a) FMNIST-trained model with MNIST as OOD; (b) as in (a) except we now discern between generated samples and MNIST. (c) CIFAR10-trained model with SVHN as OOD; (d) as in (c) except we now discern between generated samples and CelebA.
  • Figure 5: Overview of likelihood pathologies: (a-d) Models trained on EMNIST or MNIST assign the highest likelihoods to in-distribution data as expected, but obtain strikingly low likelihoods on generated samples -- even lower than for OOD data. (e-m) Pathologies on RGB datasets where both the in-distribution samples and the generated samples are assigned likelihoods smaller than that of OOD datapoints.
  • ...and 9 more figures