Adaptive Label Smoothing for Out-of-Distribution Detection

Mingle Xu; Jaehwan Lee; Sook Yoon; Dong Sun Park

Adaptive Label Smoothing for Out-of-Distribution Detection

Mingle Xu, Jaehwan Lee, Sook Yoon, Dong Sun Park

TL;DR

This paper addresses the poor OOD detection performance of models trained with standard label smoothing (LS) by showing that LS reduces the maximal probability and logits, causing overlap between known and unknown samples. It introduces Adaptive Label Smoothing (ALS), a regularization that keeps the maximal probability unconstrained while forcing non-maximal class probabilities to be equal, implemented as $\mathcal{L}_{ALS} = \mathcal{L}_{MPC} + \lambda \mathcal{L}_{NMPC}$ with $\mathcal{L}_{MPC} = H(\mathbf{y},\mathbf{p})$ and $\mathcal{L}_{NMPC} = \sqrt{\frac{1}{N-1} \sum_{i\neq k} (p_i - \bar{p})^2}$, where $k=\arg\max_i p_i$. Empirically, ALS improves both known-class accuracy and unknown-class discrimination across six datasets and multiple OOD score functions, demonstrating robustness across architectures and suggesting it as a plug-in technique for broader tasks. The results indicate that de-emphasizing the fixed targets for non-true classes while preserving a flexible max-probability target yields clearer margins between known and unknown samples, enhancing practical OOD detection performance.

Abstract

Out-of-distribution (OOD) detection, which aims to distinguish unknown classes from known classes, has received increasing attention recently. A main challenge within is the unavailable of samples from the unknown classes in the training process, and an effective strategy is to improve the performance for known classes. Using beneficial strategies such as data augmentation and longer training is thus a way to improve OOD detection. However, label smoothing, an effective method for classifying known classes, degrades the performance of OOD detection, and this phenomenon is under exploration. In this paper, we first analyze that the limited and predefined learning target in label smoothing results in the smaller maximal probability and logit, which further leads to worse OOD detection performance. To mitigate this issue, we then propose a novel regularization method, called adaptive label smoothing (ALS), and the core is to push the non-true classes to have same probabilities whereas the maximal probability is neither fixed nor limited. Extensive experimental results in six datasets with two backbones suggest that ALS contributes to classifying known samples and discerning unknown samples with clear margins. Our code will be available to the public.

Adaptive Label Smoothing for Out-of-Distribution Detection

TL;DR

with

and

, where

. Empirically, ALS improves both known-class accuracy and unknown-class discrimination across six datasets and multiple OOD score functions, demonstrating robustness across architectures and suggesting it as a plug-in technique for broader tasks. The results indicate that de-emphasizing the fixed targets for non-true classes while preserving a flexible max-probability target yields clearer margins between known and unknown samples, enhancing practical OOD detection performance.

Abstract

Paper Structure (14 sections, 5 equations, 1 figure, 7 tables)

This paper contains 14 sections, 5 equations, 1 figure, 7 tables.

Introduction
Related Work
Regularization
OOD Detection
Method
Formalization and Notation
OOD Detection Risks
Revisiting Label Smoothing
Adaptive Label Smoothing
Experiment
Implementation
Main Result
Analysis on adaptive label smoothing
Conclusion

Figures (1)

Figure 1: First and second rows visualize the known and all test samples (including known and unknown) via t-SNE. The third and fourth rows illustrate the density distribution of unknown score with maximal probability and logit as the score function. Each column shows a method trained in CIFAR10. Zoom in to see the scales.

Adaptive Label Smoothing for Out-of-Distribution Detection

TL;DR

Abstract

Adaptive Label Smoothing for Out-of-Distribution Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (1)