Out-of-distribution detection using normalizing flows on the data manifold
Seyedeh Fatemeh Razavi, Mohammad Mahdi Mehmanchi, Reshad Hosseini, Mostafa Tavassolipour
TL;DR
This work tackles out-of-distribution detection for high-dimensional data by addressing the failure of standard normalizing flows to distinguish ID from OOD samples. It introduces a joint approach that learns a density on a data manifold and a distance-to-manifold penalty, while also incorporating a data-complexity correction at test time, all without changing the NF architecture or requiring OOD training data. The latent space is decomposed into on-manifold and off-manifold components, with $p_Z(z)=p_U(u)p_V(v)$, and the training objective combines negative log-likelihood, a manifold-reconstruction penalty (via a Huber switch), and a data-complexity term; the penalty is scaled by a variance-derived factor to harmonize terms. Empirical results on color and grayscale image benchmarks show improved OOD detection (AUROC) over strong baselines, with the best performance achieved when combining manifold learning and data complexity (P+IC), and the method also maintains competitive generation quality. The findings highlight the practical value of integrating manifold-informed likelihood with test-time complexity measures for robust OOD detection in real-world applications.
Abstract
Using the intuition that out-of-distribution data have lower likelihoods, a common approach for out-of-distribution detection involves estimating the underlying data distribution. Normalizing flows are likelihood-based generative models providing a tractable density estimation via dimension-preserving invertible transformations. Conventional normalizing flows are prone to fail in out-of-distribution detection, because of the well-known curse of dimensionality problem of the likelihood-based models. To solve the problem of likelihood-based models, some works try to modify likelihood for example by incorporating a data complexity measure. We observed that these modifications are still insufficient. According to the manifold hypothesis, real-world data often lie on a low-dimensional manifold. Therefore, we proceed by estimating the density on a low-dimensional manifold and calculating a distance from the manifold as a measure for out-of-distribution detection. We propose a powerful criterion that combines this measure with the modified likelihood measure based on data complexity. Extensive experimental results show that incorporating manifold learning while accounting for the estimation of data complexity improves the out-of-distribution detection ability of normalizing flows. This improvement is achieved without modifying the model structure or using auxiliary out-of-distribution data during training.
