Table of Contents
Fetching ...

ConjNorm: Tractable Density Estimation for Out-of-Distribution Detection

Bo Peng, Yadan Luo, Yonggang Zhang, Yixuan Li, Zhen Fang

TL;DR

This work introduces ConjNorm, a density-based, post-hoc OOD detection framework grounded in a Bregman-divergence theory over the exponential family. By optimizing the norm coefficient $p$ (with conjugate $q$ where $1/p+1/q=1$) and employing a tractable, unbiased importance-sampling estimator for the partition function $\Phi(k)$, ConjNorm achieves state-of-the-art OOD detection performance across CIFAR and ImageNet benchmarks. The approach unifies prior density-based, logit-based, and distance-based methods under a single principled framework and demonstrates strong empirical gains across standard, hard, and long-tailed OOD settings. The combination of theoretical guarantees and practical IS-based normalization enables robust density-based scoring that improves FPR95 and AUROC, with implications for safer deployment of classifiers in open-world environments.

Abstract

Post-hoc out-of-distribution (OOD) detection has garnered intensive attention in reliable machine learning. Many efforts have been dedicated to deriving score functions based on logits, distances, or rigorous data distribution assumptions to identify low-scoring OOD samples. Nevertheless, these estimate scores may fail to accurately reflect the true data density or impose impractical constraints. To provide a unified perspective on density-based score design, we propose a novel theoretical framework grounded in Bregman divergence, which extends distribution considerations to encompass an exponential family of distributions. Leveraging the conjugation constraint revealed in our theorem, we introduce a \textsc{ConjNorm} method, reframing density function design as a search for the optimal norm coefficient $p$ against the given dataset. In light of the computational challenges of normalization, we devise an unbiased and analytically tractable estimator of the partition function using the Monte Carlo-based importance sampling technique. Extensive experiments across OOD detection benchmarks empirically demonstrate that our proposed \textsc{ConjNorm} has established a new state-of-the-art in a variety of OOD detection setups, outperforming the current best method by up to 13.25$\%$ and 28.19$\%$ (FPR95) on CIFAR-100 and ImageNet-1K, respectively.

ConjNorm: Tractable Density Estimation for Out-of-Distribution Detection

TL;DR

This work introduces ConjNorm, a density-based, post-hoc OOD detection framework grounded in a Bregman-divergence theory over the exponential family. By optimizing the norm coefficient (with conjugate where ) and employing a tractable, unbiased importance-sampling estimator for the partition function , ConjNorm achieves state-of-the-art OOD detection performance across CIFAR and ImageNet benchmarks. The approach unifies prior density-based, logit-based, and distance-based methods under a single principled framework and demonstrates strong empirical gains across standard, hard, and long-tailed OOD settings. The combination of theoretical guarantees and practical IS-based normalization enables robust density-based scoring that improves FPR95 and AUROC, with implications for safer deployment of classifiers in open-world environments.

Abstract

Post-hoc out-of-distribution (OOD) detection has garnered intensive attention in reliable machine learning. Many efforts have been dedicated to deriving score functions based on logits, distances, or rigorous data distribution assumptions to identify low-scoring OOD samples. Nevertheless, these estimate scores may fail to accurately reflect the true data density or impose impractical constraints. To provide a unified perspective on density-based score design, we propose a novel theoretical framework grounded in Bregman divergence, which extends distribution considerations to encompass an exponential family of distributions. Leveraging the conjugation constraint revealed in our theorem, we introduce a \textsc{ConjNorm} method, reframing density function design as a search for the optimal norm coefficient against the given dataset. In light of the computational challenges of normalization, we devise an unbiased and analytically tractable estimator of the partition function using the Monte Carlo-based importance sampling technique. Extensive experiments across OOD detection benchmarks empirically demonstrate that our proposed \textsc{ConjNorm} has established a new state-of-the-art in a variety of OOD detection setups, outperforming the current best method by up to 13.25 and 28.19 (FPR95) on CIFAR-100 and ImageNet-1K, respectively.
Paper Structure (30 sections, 5 theorems, 32 equations, 5 figures, 12 tables)

This paper contains 30 sections, 5 theorems, 32 equations, 5 figures, 12 tables.

Key Result

Theorem 1

Suppose that $\psi(\cdot)$ and $\varphi(\cdot)$ are conjugate Legendre functions. Let $\hat{p}_{\boldsymbol{\theta}}\left(\mathbf{z}|k\right )$ be a member of the exponential family conditioned on the $k$-th ID class with cumulant function $\varphi$ and parameters $\boldsymbol{\eta }_k~(k=1,...,K)$,

Figures (5)

  • Figure 1: Illustration of the alignment of GEM score and true density of Gaussian (Left) and Gamma (Right) distributions.
  • Figure 2: Evaluations of different partition function estimation baselines on ImageNet: Left: MobileNetV2 and Right: ResNet50.
  • Figure 3: Ablation study using feature extractions from (a) the first, (b) the second, and (c) the last dense block of the DenseNet on the CIFAR-10.
  • Figure 4: Ablation study w.r.t varing sampling ratio $\alpha$ in red; and the norm coefficient $p$ in blue.
  • Figure 5: Comparisons of varying $q$ when $p$ is fixed at 2.5 (Left) and 3.0 (Right) on CIFAR-100.

Theorems & Definitions (10)

  • Definition 1: Exponential Family of Distribution 71
  • Definition 2: Bregman Divergence 75
  • Theorem 1: 70
  • Definition 3: Total Variation
  • Lemma 1
  • Definition 4: KL Divergence
  • Lemma 2: Relation between Bregman Divergences and KL Divergence 59
  • Lemma 3: Pinsker inequality
  • Theorem 2
  • proof