VMF-GOS: Geometry-guided virtual Outlier Synthesis for Long-Tailed OOD Detection
Ningkang Peng, Qianfeng Yu, Yuhao Zhang, Yafei Liu, Xiaoqian Peng, Peirong Ma, Yi Chen, Peiheng Li, Yanhui Gu
TL;DR
The paper tackles the challenge of detecting OOD samples under long-tailed distributions without external data. It introduces VMF-GOS, which uses a vMF mixture on the hypersphere to model ID geometry and a Geometry-guided Outlier Synthesis (GOS) mechanism to generate boundary outliers in low-likelihood regions. A trio of objectives—Dual-Granularity Semantic Loss (DGS), Temperature Scaling-Based Logit Adjustment (TLA), and Energy Polarization Regularization (EPR)—regulates the feature space and energy landscape, with ODIN-style post-processing for robustness. Empirical results on CIFAR-LT benchmarks show state-of-the-art performance compared to both data-free and external-outlier methods, highlighting the practical potential of data-free boundary synthesis for long-tailed OOD detection.
Abstract
Out-of-Distribution (OOD) detection under long-tailed distributions is a highly challenging task because the scarcity of samples in tail classes leads to blurred decision boundaries in the feature space. Current state-of-the-art (sota) methods typically employ Outlier Exposure (OE) strategies, relying on large-scale real external datasets (such as 80 Million Tiny Images) to regularize the feature space. However, this dependence on external data often becomes infeasible in practical deployment due to high data acquisition costs and privacy sensitivity. To this end, we propose a novel data-free framework aimed at completely eliminating reliance on external datasets while maintaining superior detection performance. We introduce a Geometry-guided virtual Outlier Synthesis (GOS) strategy that models statistical properties using the von Mises-Fisher (vMF) distribution on a hypersphere. Specifically, we locate a low-likelihood annulus in the feature space and perform directional sampling of virtual outliers in this region. Simultaneously, we introduce a new Dual-Granularity Semantic Loss (DGS) that utilizes contrastive learning to maximize the distinction between in-distribution (ID) features and these synthesized boundary outliers. Extensive experiments on benchmarks such as CIFAR-LT demonstrate that our method outperforms sota approaches that utilize external real images.
