Table of Contents
Fetching ...

VMF-GOS: Geometry-guided virtual Outlier Synthesis for Long-Tailed OOD Detection

Ningkang Peng, Qianfeng Yu, Yuhao Zhang, Yafei Liu, Xiaoqian Peng, Peirong Ma, Yi Chen, Peiheng Li, Yanhui Gu

TL;DR

The paper tackles the challenge of detecting OOD samples under long-tailed distributions without external data. It introduces VMF-GOS, which uses a vMF mixture on the hypersphere to model ID geometry and a Geometry-guided Outlier Synthesis (GOS) mechanism to generate boundary outliers in low-likelihood regions. A trio of objectives—Dual-Granularity Semantic Loss (DGS), Temperature Scaling-Based Logit Adjustment (TLA), and Energy Polarization Regularization (EPR)—regulates the feature space and energy landscape, with ODIN-style post-processing for robustness. Empirical results on CIFAR-LT benchmarks show state-of-the-art performance compared to both data-free and external-outlier methods, highlighting the practical potential of data-free boundary synthesis for long-tailed OOD detection.

Abstract

Out-of-Distribution (OOD) detection under long-tailed distributions is a highly challenging task because the scarcity of samples in tail classes leads to blurred decision boundaries in the feature space. Current state-of-the-art (sota) methods typically employ Outlier Exposure (OE) strategies, relying on large-scale real external datasets (such as 80 Million Tiny Images) to regularize the feature space. However, this dependence on external data often becomes infeasible in practical deployment due to high data acquisition costs and privacy sensitivity. To this end, we propose a novel data-free framework aimed at completely eliminating reliance on external datasets while maintaining superior detection performance. We introduce a Geometry-guided virtual Outlier Synthesis (GOS) strategy that models statistical properties using the von Mises-Fisher (vMF) distribution on a hypersphere. Specifically, we locate a low-likelihood annulus in the feature space and perform directional sampling of virtual outliers in this region. Simultaneously, we introduce a new Dual-Granularity Semantic Loss (DGS) that utilizes contrastive learning to maximize the distinction between in-distribution (ID) features and these synthesized boundary outliers. Extensive experiments on benchmarks such as CIFAR-LT demonstrate that our method outperforms sota approaches that utilize external real images.

VMF-GOS: Geometry-guided virtual Outlier Synthesis for Long-Tailed OOD Detection

TL;DR

The paper tackles the challenge of detecting OOD samples under long-tailed distributions without external data. It introduces VMF-GOS, which uses a vMF mixture on the hypersphere to model ID geometry and a Geometry-guided Outlier Synthesis (GOS) mechanism to generate boundary outliers in low-likelihood regions. A trio of objectives—Dual-Granularity Semantic Loss (DGS), Temperature Scaling-Based Logit Adjustment (TLA), and Energy Polarization Regularization (EPR)—regulates the feature space and energy landscape, with ODIN-style post-processing for robustness. Empirical results on CIFAR-LT benchmarks show state-of-the-art performance compared to both data-free and external-outlier methods, highlighting the practical potential of data-free boundary synthesis for long-tailed OOD detection.

Abstract

Out-of-Distribution (OOD) detection under long-tailed distributions is a highly challenging task because the scarcity of samples in tail classes leads to blurred decision boundaries in the feature space. Current state-of-the-art (sota) methods typically employ Outlier Exposure (OE) strategies, relying on large-scale real external datasets (such as 80 Million Tiny Images) to regularize the feature space. However, this dependence on external data often becomes infeasible in practical deployment due to high data acquisition costs and privacy sensitivity. To this end, we propose a novel data-free framework aimed at completely eliminating reliance on external datasets while maintaining superior detection performance. We introduce a Geometry-guided virtual Outlier Synthesis (GOS) strategy that models statistical properties using the von Mises-Fisher (vMF) distribution on a hypersphere. Specifically, we locate a low-likelihood annulus in the feature space and perform directional sampling of virtual outliers in this region. Simultaneously, we introduce a new Dual-Granularity Semantic Loss (DGS) that utilizes contrastive learning to maximize the distinction between in-distribution (ID) features and these synthesized boundary outliers. Extensive experiments on benchmarks such as CIFAR-LT demonstrate that our method outperforms sota approaches that utilize external real images.
Paper Structure (12 sections, 1 theorem, 13 equations, 4 figures, 4 tables)

This paper contains 12 sections, 1 theorem, 13 equations, 4 figures, 4 tables.

Key Result

Theorem 4.1

Consider a $d$-dimensional feature vector $z \in \mathbb{S}^{d-1}$ following a vMF distribution with mean direction $\mu_k$ and concentration parameter $\kappa$. Under high-dimensional concentration ($d \to \infty$ and large $\kappa$), the scaled angular displacement $\xi = 2\kappa(1 - \mu_k^\top z) where $t = \mu_k^\top z$ denotes the cosine similarity.

Figures (4)

  • Figure 1: UMAP visualization of feature manifolds on CIFAR-100 with LSUN as the OOD dataset. (a) The DARL baseline exhibits substantial overlap between tail classes and OOD samples, leading to blurred decision boundaries. (b) Our GOS method fosters structural compactness within ID clusters and establishes a distinct separation margin for OOD detection.
  • Figure 2: Overview of the proposed VMF-GOS framework. During the Train Phase, the GOS module leverages the asymptotic equivalence between high-dimensional hyperspherical similarity and the $\chi^{2}$ distribution to directionally sample virtual outliers within low-likelihood annular regions. The model is optimized via a joint objective function consisting of ID semantic alignment ($\mathcal{L}_{\Psi_{id}}$), OOD boundary constraints ($\mathcal{L}_{\Omega_{ood}}$), Energy Polarization Regularization ($\mathcal{L}_{epr}$), and Temperature Scaling-Based Logit Adjustment ($\mathcal{L}_{tla}$), effectively achieving feature decoupling and boundary compression. During the Test Phase, the ODIN post-processing mechanism is employed to amplify the score disparity between ID and OOD samples via gradient-based input perturbations, ensuring robust detection performance under long-tailed distributions.
  • Figure 3: Sensitivity analysis of sampling interval ($\sigma$ range) in the GOS strategy. Results are reported on CIFAR-10 (left) and CIFAR-100 (right). We evaluate the trends of AUROC (blue solid line) and Accuracy (red dashed line) with respect to different standard deviation ranges of the $\chi^{2}$ distribution. The shaded region indicates our optimal sampling interval of $2\sigma - 3\sigma$.
  • Figure 4: Performance across class frequencies on long-tailed CIFAR-10 (a) and CIFAR-100 (b). Classes are categorized into Low, Median, and Many shots based on their sample cardinality. We compare our method against the DARL in terms of AUROC and Accuracy.

Theorems & Definitions (2)

  • Theorem 4.1: Asymptotic $\chi^2$ Equivalence
  • proof : Proof Sketch