Understanding Likelihood of Normalizing Flow and Image Complexity through the Lens of Out-of-Distribution Detection
Genki Osada, Tsubasa Takahashi, Takashi Nishide
TL;DR
This paper investigates why likelihood-based OOD detection using Normalizing Flows and Autoregressive models can fail, particularly when simple or low-complexity images are involved. It introduces Density Concentration Attraction for Simpleness (DCAS), a mechanism whereby less complex inputs map to high-density regions in the NF latent space, increasing both $ ext{log } p({f z})$ and the volume term $ ext{log } | ext{det} J_f({f x})|$, thereby inflating $ ext{log } p({f x})$. Across five NF architectures and PixelCNN++, the authors demonstrate that likelihood-based methods are untrustworthy for OOD detection, and show that incorporating image complexity as an independent variable via a two-variable Gaussian Mixture Model greatly enhances detection performance on MNIST, CIFAR-10, ImageNet, and related datasets. The proposed complexity-aware approach yields robust OOD detection across diverse settings and highlights a practical path to mitigating DCAS-driven failures in DGM-based OOD detection. Overall, the work provides a unified explanation for likelihood failures and offers a scalable, architecture-agnostic remedy with clear implications for safety-critical applications.
Abstract
Out-of-distribution (OOD) detection is crucial to safety-critical machine learning applications and has been extensively studied. While recent studies have predominantly focused on classifier-based methods, research on deep generative model (DGM)-based methods have lagged relatively. This disparity may be attributed to a perplexing phenomenon: DGMs often assign higher likelihoods to unknown OOD inputs than to their known training data. This paper focuses on explaining the underlying mechanism of this phenomenon. We propose a hypothesis that less complex images concentrate in high-density regions in the latent space, resulting in a higher likelihood assignment in the Normalizing Flow (NF). We experimentally demonstrate its validity for five NF architectures, concluding that their likelihood is untrustworthy. Additionally, we show that this problem can be alleviated by treating image complexity as an independent variable. Finally, we provide evidence of the potential applicability of our hypothesis in another DGM, PixelCNN++.
