Table of Contents
Fetching ...

Long-Tailed Out-of-Distribution Detection: Prioritizing Attention to Tail

Yina He, Lei Peng, Yongcun Zhang, Juanjuan Weng, Zhiming Luo, Shaozi Li

TL;DR

This work tackles long-tailed OOD detection by decoupling ID data balance from OOD separation and introducing PATT, a framework that combines temperature scaling-based implicit semantic augmentation with post-hoc feature calibration. The training component ISAC models ID features as a mixture of von Mises-Fisher distributions on a hypersphere, enabling a closed-form, infinitely sampled contrastive loss, while the classifier is sharpened via a temperature-scaled logit adjustment. During inference, a tail-focused attention mechanism recalibrates features to balance head and tail representations and to suppress OOD confidences, using an energy-based OOD score. Across CIFAR10/100-LT and ImageNet-LT, PATT yields substantial gains in AUROC and tail/class accuracy, outperforming prior long-tailed OOD methods and demonstrating strong robustness to hyperparameters and model architectures. The approach offers a practical, end-to-end solution with improved long-tailed recognition and reliable OOD detection in real-world settings.

Abstract

Current out-of-distribution (OOD) detection methods typically assume balanced in-distribution (ID) data, while most real-world data follow a long-tailed distribution. Previous approaches to long-tailed OOD detection often involve balancing the ID data by reducing the semantics of head classes. However, this reduction can severely affect the classification accuracy of ID data. The main challenge of this task lies in the severe lack of features for tail classes, leading to confusion with OOD data. To tackle this issue, we introduce a novel Prioritizing Attention to Tail (PATT) method using augmentation instead of reduction. Our main intuition involves using a mixture of von Mises-Fisher (vMF) distributions to model the ID data and a temperature scaling module to boost the confidence of ID data. This enables us to generate infinite contrastive pairs, implicitly enhancing the semantics of ID classes while promoting differentiation between ID and OOD data. To further strengthen the detection of OOD data without compromising the classification performance of ID data, we propose feature calibration during the inference phase. By extracting an attention weight from the training set that prioritizes the tail classes and reduces the confidence in OOD data, we improve the OOD detection capability. Extensive experiments verified that our method outperforms the current state-of-the-art methods on various benchmarks.

Long-Tailed Out-of-Distribution Detection: Prioritizing Attention to Tail

TL;DR

This work tackles long-tailed OOD detection by decoupling ID data balance from OOD separation and introducing PATT, a framework that combines temperature scaling-based implicit semantic augmentation with post-hoc feature calibration. The training component ISAC models ID features as a mixture of von Mises-Fisher distributions on a hypersphere, enabling a closed-form, infinitely sampled contrastive loss, while the classifier is sharpened via a temperature-scaled logit adjustment. During inference, a tail-focused attention mechanism recalibrates features to balance head and tail representations and to suppress OOD confidences, using an energy-based OOD score. Across CIFAR10/100-LT and ImageNet-LT, PATT yields substantial gains in AUROC and tail/class accuracy, outperforming prior long-tailed OOD methods and demonstrating strong robustness to hyperparameters and model architectures. The approach offers a practical, end-to-end solution with improved long-tailed recognition and reliable OOD detection in real-world settings.

Abstract

Current out-of-distribution (OOD) detection methods typically assume balanced in-distribution (ID) data, while most real-world data follow a long-tailed distribution. Previous approaches to long-tailed OOD detection often involve balancing the ID data by reducing the semantics of head classes. However, this reduction can severely affect the classification accuracy of ID data. The main challenge of this task lies in the severe lack of features for tail classes, leading to confusion with OOD data. To tackle this issue, we introduce a novel Prioritizing Attention to Tail (PATT) method using augmentation instead of reduction. Our main intuition involves using a mixture of von Mises-Fisher (vMF) distributions to model the ID data and a temperature scaling module to boost the confidence of ID data. This enables us to generate infinite contrastive pairs, implicitly enhancing the semantics of ID classes while promoting differentiation between ID and OOD data. To further strengthen the detection of OOD data without compromising the classification performance of ID data, we propose feature calibration during the inference phase. By extracting an attention weight from the training set that prioritizes the tail classes and reduces the confidence in OOD data, we improve the OOD detection capability. Extensive experiments verified that our method outperforms the current state-of-the-art methods on various benchmarks.
Paper Structure (46 sections, 15 equations, 6 figures, 14 tables, 1 algorithm)

This paper contains 46 sections, 15 equations, 6 figures, 14 tables, 1 algorithm.

Figures (6)

  • Figure 1: Visualization of the Comparison between PATT and other methods on ImageNet-LT. (a) Comparison of separate accuracy for head and tail classes on ImageNet-LT between PATT and other methods. (b) (c) Visualization of feature distribution across the top ten classes (Head), the bottom ten classes (Tail), and OOD data from PASCL and PATT.
  • Figure 2: Overview of the proposed framework. The framework consists of a temperature scaling-based implicit semantic augmentation training phase and a feature calibration inference phase. We jointly optimize two complementary terms to encourage desirable hypersphere embeddings: an implicit semantic augmentation contrastive loss to encourage a balanced feature encoder and a temperature scaling-based logit adjustment loss to encourage a balanced high-confidence classifier. Feature calibration fine-tunes features during the inference phase by using an attention weight extracted from the training set, thereby achieving desirable ID classification and OOD detection results.
  • Figure 3: It visualizes the dependence of OOD, head class, and tail class samples on feature channels, showing that these three types of samples rely on different feature channels.
  • Figure 4: Confidence Distribution from CIFAR10-LT and other six test OOD datasets.
  • Figure 5: Visualization of confidence frequency from our method and OE. The CIFAR10 is used as ID data and SVHN is OOD test dataset.
  • ...and 1 more figures