Rethinking Test-time Likelihood: The Likelihood Path Principle and Its Application to OOD Detection

Sicong Huang; Jiawei He; Kry Yik Chau Lui

Rethinking Test-time Likelihood: The Likelihood Path Principle and Its Application to OOD Detection

Sicong Huang, Jiawei He, Kry Yik Chau Lui

TL;DR

This work tackles the unreliability of deep generative model likelihoods for out-of-distribution detection by introducing the Likelihood Path Principle (LPath) for VAEs. By focusing on minimal sufficient statistics of encoder/decoder conditional likelihoods, the authors derive non-asymptotic OOD guarantees through new concepts like nearly essential support, essential distance, and co-Lipschitzness, and present a two-stage, provably robust OOD method. The methodology yields state-of-the-art unsupervised OOD performance with simple, small VAEs, and demonstrates that carefully chosen, model-informed statistics can outperform more complex density estimation. The theoretical guarantees and practical algorithm offer a principled path for robust OOD detection in streaming, unsupervised settings, with clear avenues for extending to more powerful diffusion- or flow-based models.

Abstract

While likelihood is attractive in theory, its estimates by deep generative models (DGMs) are often broken in practice, and perform poorly for out of distribution (OOD) Detection. Various recent works started to consider alternative scores and achieved better performances. However, such recipes do not come with provable guarantees, nor is it clear that their choices extract sufficient information. We attempt to change this by conducting a case study on variational autoencoders (VAEs). First, we introduce the likelihood path (LPath) principle, generalizing the likelihood principle. This narrows the search for informative summary statistics down to the minimal sufficient statistics of VAEs' conditional likelihoods. Second, introducing new theoretic tools such as nearly essential support, essential distance and co-Lipschitzness, we obtain non-asymptotic provable OOD detection guarantees for certain distillation of the minimal sufficient statistics. The corresponding LPath algorithm demonstrates SOTA performances, even using simple and small VAEs with poor likelihood estimates. To our best knowledge, this is the first provable unsupervised OOD method that delivers excellent empirical results, better than any other VAEs based techniques. We use the same model as \cite{xiao2020likelihood}, open sourced from: https://github.com/XavierXiao/Likelihood-Regret

Rethinking Test-time Likelihood: The Likelihood Path Principle and Its Application to OOD Detection

TL;DR

Abstract

Paper Structure (32 sections, 8 theorems, 53 equations, 11 figures, 5 tables)

This paper contains 32 sections, 8 theorems, 53 equations, 11 figures, 5 tables.

Introduction
From the Likelihood Principle to the Likelihood Path Principle
From the Likelihood Path Principle to OOD Detection
Provable data and model dependent OOD detection performances
Essential Separation and Essential Distance
Generalizing Lipschitzness
Provable OOD detection performance guarantee for VAEs
Not All OOD Samples are Created Equal, Not All Statistics are Applied the Same
Methodology and Algorithm
Experiments
Conclusion
Appendix for Sectiion \ref{['sec:intro']}
Related Work
Appendix for Section \ref{['sec:theory_analysis']}
Supplementary Materials for Section \ref{['sec:essential_separation_distance']}
...and 17 more sections

Key Result

Theorem 3.8

Fix $P_{\text{IID}}$, $P_{\text{OOD}}$, $m_{\text{intra}} > 0$ and $m_{\text{inter}} > 2 \cdot m_{\text{intra}}$. Assume without loss of generality the corresponding $\arg \min$ in Definition def:minimal_prob_distance for $m_{\text{inter}}$ exists, denoted as: $(\epsilon^*_{\text{IID}}, \epsilon^*_{

Figures (11)

Figure 1: Main idea illustration.Left, we have $\mathbf{x}_{\text{IID}}$ distribution (blue) and $\mathbf{x}_{\text{OOD}}$ distribution (yellow) in the visible space. $\mathbf{x}_{\text{OOD}}$ is classified into four cases. Middle, we have prior (turquoise), posterior after observing $\mathbf{x}_{\text{IID}}$ (blue), posterior divided into four cases after observing $\mathbf{x}_{\text{OOD}}$ (yellow), in the latent space. Right, we have the reconstructed $\widehat{\mathbf{x}}_{\text{IID}}$ (red) on top of real $\mathbf{x}_{\text{IID}}$ distribution (blue), and $\widehat{\mathbf{x}}_{\text{OOD}}$ again divided into four cases. Cases (1) and (2)'s graphs means $\widehat{\mathbf{x}}_{\text{OOD}}$ is well reconstructed, while the fried egg alike shapes for Cases (3) and (4) indicate $\widehat{\mathbf{x}}_{\text{OOD}}$ are poorly reconstructed. The grey area indicates some pathological OOD regions where VAE assigns high density but not a lot of volume. When integrated, these regions give nearly zero probabilities, and hence the data therein cannot be sampled in polynomial times. These are atypical sets.
Figure 3: Left:$\mathrm{Supt}(P_{\text{IID}})$ is the red solid line, which is decomposed to one nearly essential support (purple solid line), and less likely events (two green solid lines). Right:$\mathrm{Supt}(P_{\text{IID}})$ and $\mathrm{Supt}(P_{\text{OOD}})$ are the purple solid line. $\mathrm{ESupt}(P_{\text{IID}})$ is the blue solid line and $\mathrm{ESupt}(P_{\text{OOD}})$ is the red solid line. The green solid line depicts the corresponding essential distance, so they are essentially separable. The key idea is that for many overlapped distributions, most of their samples are separable.
Figure 4: Left: If $f$ is $L$-Lipschitz, it cannot (forward) push one small region $B_R(\mathbf{x})$ to a big one (diameter no more than $2LR$) - $f$ is not "one-to-many". Right: If $f$ is $(K, k)$ co-Lipschitz, its preimage $f^{-1}$ cannot (backward) pull one small region $B_R(\mathbf{y})$ to a big one (diameter no more than $2KR + k$) - $f$ is "one-to-one".
Figure 5: Illustration of $v$ statistics in Equation \ref{['eqn:test_statisics']}. Region 1 (turquoise) and Region 3 (grey) indicate OOD regions, Region 2 (blue) IID is for latent manifold region. $\mu_{\mathbf{z}}(\mathbf{x})$ empirically concentrates around a spherical shell. To screen $\mathbf{x}_{\text{OOD}}$, we can track $\mathbf{z}_{\text{OOD}} := \mu_{\mathbf{z}}(\mathbf{x}_{\text{OOD}})$, and compute its distances to the IID latent manifold, $\inf_{\mathbf{z}_{\text{IID}}} \mathrm{d}(\mathbf{z}_{\text{IID}}, \mathbf{z}_{\text{OOD}})$. Since $\mathbf{z}_{\text{IID}}$ concentrates on some spherical shell of radius $r_0$, $\inf_{\mathbf{z}_{\text{IID}}} \mathrm{d} (\mathbf{z}_{\text{IID}}, \mathbf{z}_{\text{OOD}})$ can be efficiently approximated. This is one illustrative case, our reasoning holds even if $\mathbf{z}_{\text{OOD}}$ is in the blue or turquise region.
Figure 6: Small test time reconstruction for IID.
...and 6 more figures

Theorems & Definitions (32)

Definition 3.1: Nearly essential support of a Distribution
Definition 3.2: Essential Distance
Definition 3.3: Essentially Separable between IID and OOD
Definition 3.4: Margin Essential Distance
Definition 3.5: L-Lipschitz: region-wise not "one-to-many"
Definition 3.6: Co-Lipschitz: region-wise "one-to-one"
Definition 3.7: IID reconstruction distance as intra-distribution margin
Theorem 3.8: Provable OOD detection
Definition B.1
Example B.2: Partially overlapping Gaussians
...and 22 more

Rethinking Test-time Likelihood: The Likelihood Path Principle and Its Application to OOD Detection

TL;DR

Abstract

Rethinking Test-time Likelihood: The Likelihood Path Principle and Its Application to OOD Detection

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (32)