Table of Contents
Fetching ...

Rethinking Test-time Likelihood: The Likelihood Path Principle and Its Application to OOD Detection

Sicong Huang, Jiawei He, Kry Yik Chau Lui

TL;DR

This work tackles the unreliability of deep generative model likelihoods for out-of-distribution detection by introducing the Likelihood Path Principle (LPath) for VAEs. By focusing on minimal sufficient statistics of encoder/decoder conditional likelihoods, the authors derive non-asymptotic OOD guarantees through new concepts like nearly essential support, essential distance, and co-Lipschitzness, and present a two-stage, provably robust OOD method. The methodology yields state-of-the-art unsupervised OOD performance with simple, small VAEs, and demonstrates that carefully chosen, model-informed statistics can outperform more complex density estimation. The theoretical guarantees and practical algorithm offer a principled path for robust OOD detection in streaming, unsupervised settings, with clear avenues for extending to more powerful diffusion- or flow-based models.

Abstract

While likelihood is attractive in theory, its estimates by deep generative models (DGMs) are often broken in practice, and perform poorly for out of distribution (OOD) Detection. Various recent works started to consider alternative scores and achieved better performances. However, such recipes do not come with provable guarantees, nor is it clear that their choices extract sufficient information. We attempt to change this by conducting a case study on variational autoencoders (VAEs). First, we introduce the likelihood path (LPath) principle, generalizing the likelihood principle. This narrows the search for informative summary statistics down to the minimal sufficient statistics of VAEs' conditional likelihoods. Second, introducing new theoretic tools such as nearly essential support, essential distance and co-Lipschitzness, we obtain non-asymptotic provable OOD detection guarantees for certain distillation of the minimal sufficient statistics. The corresponding LPath algorithm demonstrates SOTA performances, even using simple and small VAEs with poor likelihood estimates. To our best knowledge, this is the first provable unsupervised OOD method that delivers excellent empirical results, better than any other VAEs based techniques. We use the same model as \cite{xiao2020likelihood}, open sourced from: https://github.com/XavierXiao/Likelihood-Regret

Rethinking Test-time Likelihood: The Likelihood Path Principle and Its Application to OOD Detection

TL;DR

This work tackles the unreliability of deep generative model likelihoods for out-of-distribution detection by introducing the Likelihood Path Principle (LPath) for VAEs. By focusing on minimal sufficient statistics of encoder/decoder conditional likelihoods, the authors derive non-asymptotic OOD guarantees through new concepts like nearly essential support, essential distance, and co-Lipschitzness, and present a two-stage, provably robust OOD method. The methodology yields state-of-the-art unsupervised OOD performance with simple, small VAEs, and demonstrates that carefully chosen, model-informed statistics can outperform more complex density estimation. The theoretical guarantees and practical algorithm offer a principled path for robust OOD detection in streaming, unsupervised settings, with clear avenues for extending to more powerful diffusion- or flow-based models.

Abstract

While likelihood is attractive in theory, its estimates by deep generative models (DGMs) are often broken in practice, and perform poorly for out of distribution (OOD) Detection. Various recent works started to consider alternative scores and achieved better performances. However, such recipes do not come with provable guarantees, nor is it clear that their choices extract sufficient information. We attempt to change this by conducting a case study on variational autoencoders (VAEs). First, we introduce the likelihood path (LPath) principle, generalizing the likelihood principle. This narrows the search for informative summary statistics down to the minimal sufficient statistics of VAEs' conditional likelihoods. Second, introducing new theoretic tools such as nearly essential support, essential distance and co-Lipschitzness, we obtain non-asymptotic provable OOD detection guarantees for certain distillation of the minimal sufficient statistics. The corresponding LPath algorithm demonstrates SOTA performances, even using simple and small VAEs with poor likelihood estimates. To our best knowledge, this is the first provable unsupervised OOD method that delivers excellent empirical results, better than any other VAEs based techniques. We use the same model as \cite{xiao2020likelihood}, open sourced from: https://github.com/XavierXiao/Likelihood-Regret
Paper Structure (32 sections, 8 theorems, 53 equations, 11 figures, 5 tables)

This paper contains 32 sections, 8 theorems, 53 equations, 11 figures, 5 tables.

Key Result

Theorem 3.8

Fix $P_{\text{IID}}$, $P_{\text{OOD}}$, $m_{\text{intra}} > 0$ and $m_{\text{inter}} > 2 \cdot m_{\text{intra}}$. Assume without loss of generality the corresponding $\arg \min$ in Definition def:minimal_prob_distance for $m_{\text{inter}}$ exists, denoted as: $(\epsilon^*_{\text{IID}}, \epsilon^*_{

Figures (11)

  • Figure 1: Main idea illustration.Left, we have $\mathbf{x}_{\text{IID}}$ distribution (blue) and $\mathbf{x}_{\text{OOD}}$ distribution (yellow) in the visible space. $\mathbf{x}_{\text{OOD}}$ is classified into four cases. Middle, we have prior (turquoise), posterior after observing $\mathbf{x}_{\text{IID}}$ (blue), posterior divided into four cases after observing $\mathbf{x}_{\text{OOD}}$ (yellow), in the latent space. Right, we have the reconstructed $\widehat{\mathbf{x}}_{\text{IID}}$ (red) on top of real $\mathbf{x}_{\text{IID}}$ distribution (blue), and $\widehat{\mathbf{x}}_{\text{OOD}}$ again divided into four cases. Cases (1) and (2)'s graphs means $\widehat{\mathbf{x}}_{\text{OOD}}$ is well reconstructed, while the fried egg alike shapes for Cases (3) and (4) indicate $\widehat{\mathbf{x}}_{\text{OOD}}$ are poorly reconstructed. The grey area indicates some pathological OOD regions where VAE assigns high density but not a lot of volume. When integrated, these regions give nearly zero probabilities, and hence the data therein cannot be sampled in polynomial times. These are atypical sets.
  • Figure 3: Left:$\mathrm{Supt}(P_{\text{IID}})$ is the red solid line, which is decomposed to one nearly essential support (purple solid line), and less likely events (two green solid lines). Right:$\mathrm{Supt}(P_{\text{IID}})$ and $\mathrm{Supt}(P_{\text{OOD}})$ are the purple solid line. $\mathrm{ESupt}(P_{\text{IID}})$ is the blue solid line and $\mathrm{ESupt}(P_{\text{OOD}})$ is the red solid line. The green solid line depicts the corresponding essential distance, so they are essentially separable. The key idea is that for many overlapped distributions, most of their samples are separable.
  • Figure 4: Left: If $f$ is $L$-Lipschitz, it cannot (forward) push one small region $B_R(\mathbf{x})$ to a big one (diameter no more than $2LR$) - $f$ is not "one-to-many". Right: If $f$ is $(K, k)$ co-Lipschitz, its preimage $f^{-1}$ cannot (backward) pull one small region $B_R(\mathbf{y})$ to a big one (diameter no more than $2KR + k$) - $f$ is "one-to-one".
  • Figure 5: Illustration of $v$ statistics in Equation \ref{['eqn:test_statisics']}. Region 1 (turquoise) and Region 3 (grey) indicate OOD regions, Region 2 (blue) IID is for latent manifold region. $\mu_{\mathbf{z}}(\mathbf{x})$ empirically concentrates around a spherical shell. To screen $\mathbf{x}_{\text{OOD}}$, we can track $\mathbf{z}_{\text{OOD}} := \mu_{\mathbf{z}}(\mathbf{x}_{\text{OOD}})$, and compute its distances to the IID latent manifold, $\inf_{\mathbf{z}_{\text{IID}}} \mathrm{d}(\mathbf{z}_{\text{IID}}, \mathbf{z}_{\text{OOD}})$. Since $\mathbf{z}_{\text{IID}}$ concentrates on some spherical shell of radius $r_0$, $\inf_{\mathbf{z}_{\text{IID}}} \mathrm{d} (\mathbf{z}_{\text{IID}}, \mathbf{z}_{\text{OOD}})$ can be efficiently approximated. This is one illustrative case, our reasoning holds even if $\mathbf{z}_{\text{OOD}}$ is in the blue or turquise region.
  • Figure 6: Small test time reconstruction for IID.
  • ...and 6 more figures

Theorems & Definitions (32)

  • Definition 3.1: Nearly essential support of a Distribution
  • Definition 3.2: Essential Distance
  • Definition 3.3: Essentially Separable between IID and OOD
  • Definition 3.4: Margin Essential Distance
  • Definition 3.5: L-Lipschitz: region-wise not "one-to-many"
  • Definition 3.6: Co-Lipschitz: region-wise "one-to-one"
  • Definition 3.7: IID reconstruction distance as intra-distribution margin
  • Theorem 3.8: Provable OOD detection
  • Definition B.1
  • Example B.2: Partially overlapping Gaussians
  • ...and 22 more