Table of Contents
Fetching ...

Mitigating the Likelihood Paradox in Flow-based OOD Detection via Entropy Manipulation

Donghwan Kim, Hyunsoo Yoon

TL;DR

Normalizing flows can misrank in-distribution vs. out-of-distribution inputs due to entropy-related effects in likelihood. The authors propose SPEM, a training-free, test-time entropy manipulation method that scales perturbations based on semantic similarity from an in-distribution memory bank, preserving the underlying likelihood score while enhancing separation. They prove lower bounds showing that entropy perturbations can widen the ID/OOD log-likelihood gap and validate SPEM across ten ID/OOD pairs, achieving consistent AUROC gains over baselines and showing robustness to entropy ordering. Analyzing SPEM-noise reveals scenarios where perturbing with Gaussian noise alone can yield strong separation, highlighting the role of entropy and KL terms. Overall, SPEM offers a practical, architecture-agnostic approach to align likelihood-based OOD detection with semantic typicality without additional model training.

Abstract

Deep generative models that can tractably compute input likelihoods, including normalizing flows, often assign unexpectedly high likelihoods to out-of-distribution (OOD) inputs. We mitigate this likelihood paradox by manipulating input entropy based on semantic similarity, applying stronger perturbations to inputs that are less similar to an in-distribution memory bank. We provide a theoretical analysis showing that entropy control increases the expected log-likelihood gap between in-distribution and OOD samples in favor of the in-distribution, and we explain why the procedure works without any additional training of the density model. We then evaluate our method against likelihood-based OOD detectors on standard benchmarks and find consistent AUROC improvements over baselines, supporting our explanation.

Mitigating the Likelihood Paradox in Flow-based OOD Detection via Entropy Manipulation

TL;DR

Normalizing flows can misrank in-distribution vs. out-of-distribution inputs due to entropy-related effects in likelihood. The authors propose SPEM, a training-free, test-time entropy manipulation method that scales perturbations based on semantic similarity from an in-distribution memory bank, preserving the underlying likelihood score while enhancing separation. They prove lower bounds showing that entropy perturbations can widen the ID/OOD log-likelihood gap and validate SPEM across ten ID/OOD pairs, achieving consistent AUROC gains over baselines and showing robustness to entropy ordering. Analyzing SPEM-noise reveals scenarios where perturbing with Gaussian noise alone can yield strong separation, highlighting the role of entropy and KL terms. Overall, SPEM offers a practical, architecture-agnostic approach to align likelihood-based OOD detection with semantic typicality without additional model training.

Abstract

Deep generative models that can tractably compute input likelihoods, including normalizing flows, often assign unexpectedly high likelihoods to out-of-distribution (OOD) inputs. We mitigate this likelihood paradox by manipulating input entropy based on semantic similarity, applying stronger perturbations to inputs that are less similar to an in-distribution memory bank. We provide a theoretical analysis showing that entropy control increases the expected log-likelihood gap between in-distribution and OOD samples in favor of the in-distribution, and we explain why the procedure works without any additional training of the density model. We then evaluate our method against likelihood-based OOD detectors on standard benchmarks and find consistent AUROC improvements over baselines, supporting our explanation.
Paper Structure (22 sections, 18 theorems, 86 equations, 4 figures, 7 tables)

This paper contains 22 sections, 18 theorems, 86 equations, 4 figures, 7 tables.

Key Result

Theorem 3.1

Let $P$, $P_\theta$, $Q$ be $d$-dimensional continuous probability distributions on $\mathbb{R}^d$. Let $X \sim Q$, $Z \sim \mathcal{N}(0, \sigma^2 I_d)$, and define $Q'$ as the distribution of $X+Z$. Then a lower bound on the expected log-likelihood difference estimated by $P_\theta$ between $P$ an

Figures (4)

  • Figure 1: AUROC changes with entropy manipulation intensity and log-likelihood assignments with and without perturbation. We increase $\sigma$ from 0.001 to 0.02 and add $\mathbf{z} \sim N(0, \sigma^2I_d)$ perturbation to SVHN to create a noisy SVHN distribution, and perform OOD detection through likelihood with Glow trained on CIFAR-10 and CelebA. The histogram shows the change in log-likelihood assignment before and after perturbation when Glow is trained with CIFAR-10.
  • Figure 2: The overall framework of SPEM. $f$ is a density estimation model that provides a tractable likelihood for the input vector and estimates the ID, and $g$ is a feature extractor pretrained with general image data, capable of sufficiently extracting features for each image.
  • Figure 3: AUROC for IDs composed of real images as a function of $\sigma$ when the OOD is $\mathcal{N}(0, \sigma^2I_d)$.
  • Figure 4: OOD detection performance according to $\alpha$, which controls the intensity of entropy manipulation. The legend indicates the experimental dataset pairs, with the former indicating in-distribution and the latter indicating OOD. All experiments were set up identically to those in Table \ref{['tab:main_exp']}.

Theorems & Definitions (29)

  • Theorem 3.1
  • Theorem 4.1
  • Theorem 6.1
  • Theorem 6.2
  • Theorem 6.3
  • Theorem A.1
  • proof
  • Theorem A.2
  • proof
  • Theorem C.1
  • ...and 19 more