Out-of-distribution detection using normalizing flows on the data manifold

Seyedeh Fatemeh Razavi; Mohammad Mahdi Mehmanchi; Reshad Hosseini; Mostafa Tavassolipour

Out-of-distribution detection using normalizing flows on the data manifold

Seyedeh Fatemeh Razavi, Mohammad Mahdi Mehmanchi, Reshad Hosseini, Mostafa Tavassolipour

TL;DR

This work tackles out-of-distribution detection for high-dimensional data by addressing the failure of standard normalizing flows to distinguish ID from OOD samples. It introduces a joint approach that learns a density on a data manifold and a distance-to-manifold penalty, while also incorporating a data-complexity correction at test time, all without changing the NF architecture or requiring OOD training data. The latent space is decomposed into on-manifold and off-manifold components, with $p_Z(z)=p_U(u)p_V(v)$, and the training objective combines negative log-likelihood, a manifold-reconstruction penalty (via a Huber switch), and a data-complexity term; the penalty is scaled by a variance-derived factor to harmonize terms. Empirical results on color and grayscale image benchmarks show improved OOD detection (AUROC) over strong baselines, with the best performance achieved when combining manifold learning and data complexity (P+IC), and the method also maintains competitive generation quality. The findings highlight the practical value of integrating manifold-informed likelihood with test-time complexity measures for robust OOD detection in real-world applications.

Abstract

Using the intuition that out-of-distribution data have lower likelihoods, a common approach for out-of-distribution detection involves estimating the underlying data distribution. Normalizing flows are likelihood-based generative models providing a tractable density estimation via dimension-preserving invertible transformations. Conventional normalizing flows are prone to fail in out-of-distribution detection, because of the well-known curse of dimensionality problem of the likelihood-based models. To solve the problem of likelihood-based models, some works try to modify likelihood for example by incorporating a data complexity measure. We observed that these modifications are still insufficient. According to the manifold hypothesis, real-world data often lie on a low-dimensional manifold. Therefore, we proceed by estimating the density on a low-dimensional manifold and calculating a distance from the manifold as a measure for out-of-distribution detection. We propose a powerful criterion that combines this measure with the modified likelihood measure based on data complexity. Extensive experimental results show that incorporating manifold learning while accounting for the estimation of data complexity improves the out-of-distribution detection ability of normalizing flows. This improvement is achieved without modifying the model structure or using auxiliary out-of-distribution data during training.

Out-of-distribution detection using normalizing flows on the data manifold

TL;DR

, and the training objective combines negative log-likelihood, a manifold-reconstruction penalty (via a Huber switch), and a data-complexity term; the penalty is scaled by a variance-derived factor to harmonize terms. Empirical results on color and grayscale image benchmarks show improved OOD detection (AUROC) over strong baselines, with the best performance achieved when combining manifold learning and data complexity (P+IC), and the method also maintains competitive generation quality. The findings highlight the practical value of integrating manifold-informed likelihood with test-time complexity measures for robust OOD detection in real-world applications.

Abstract

Paper Structure (19 sections, 16 equations, 6 figures, 5 tables)

This paper contains 19 sections, 16 equations, 6 figures, 5 tables.

Introduction
Related Work
Manifold learning using NFs
Out-of-distribution detection
Non-density-based methods
Density-based methods
Preliminaries
Normalizing Flow
Reconstruction loss functions
Proposed method
Results
Experimental setting
Dataset
Architecture
Training
...and 4 more sections

Figures (6)

Figure 1: A semicircle toy dataset illustrating the proposed score for OOD detection. This score combines the NLL, which estimates density, with a measurement of the distance to the manifold, referred to as the reconstruction loss.
Figure 2: Generated CelebA images corresponding to experiments in Table \ref{['tab:table_face_manifold_CelebA']}.
Figure 3: Several generated (first row) and reconstructed images (CelebA as ID data in the second row, SVHN and CIFAR10 as OOD data in the third and forth rows, respectively) from the proposed-D method for different manifold dimensions (10, 50, 100, 500, and 1000 from left to right) for a model trained on the CelebA dataset. $\mathcal{M}_G \subset \mathbb{R}^{d}$ and $\mathcal{M}_R \subset \mathbb{R}^{d}$ represent image generation and image reconstruction from a manifold fall in dimension $d$, respectively.
Figure 4: All misdetected ID $28 \times 28$ gray-scale SVHN test data for a trained model on MNIST.
Figure 5: Distance from the manifold (reconstruction error) for three trained ID dataset and two penalty functions. It is important to note that RMSE stands for the root of MSE, while $\sqrt{H_\delta}$ refers to the square root of Huber penalization, providing a more comparable measure.
...and 1 more figures

Out-of-distribution detection using normalizing flows on the data manifold

TL;DR

Abstract

Out-of-distribution detection using normalizing flows on the data manifold

Authors

TL;DR

Abstract

Table of Contents

Figures (6)