Adaptive Label Smoothing for Out-of-Distribution Detection
Mingle Xu, Jaehwan Lee, Sook Yoon, Dong Sun Park
TL;DR
This paper addresses the poor OOD detection performance of models trained with standard label smoothing (LS) by showing that LS reduces the maximal probability and logits, causing overlap between known and unknown samples. It introduces Adaptive Label Smoothing (ALS), a regularization that keeps the maximal probability unconstrained while forcing non-maximal class probabilities to be equal, implemented as $\mathcal{L}_{ALS} = \mathcal{L}_{MPC} + \lambda \mathcal{L}_{NMPC}$ with $\mathcal{L}_{MPC} = H(\mathbf{y},\mathbf{p})$ and $\mathcal{L}_{NMPC} = \sqrt{\frac{1}{N-1} \sum_{i\neq k} (p_i - \bar{p})^2}$, where $k=\arg\max_i p_i$. Empirically, ALS improves both known-class accuracy and unknown-class discrimination across six datasets and multiple OOD score functions, demonstrating robustness across architectures and suggesting it as a plug-in technique for broader tasks. The results indicate that de-emphasizing the fixed targets for non-true classes while preserving a flexible max-probability target yields clearer margins between known and unknown samples, enhancing practical OOD detection performance.
Abstract
Out-of-distribution (OOD) detection, which aims to distinguish unknown classes from known classes, has received increasing attention recently. A main challenge within is the unavailable of samples from the unknown classes in the training process, and an effective strategy is to improve the performance for known classes. Using beneficial strategies such as data augmentation and longer training is thus a way to improve OOD detection. However, label smoothing, an effective method for classifying known classes, degrades the performance of OOD detection, and this phenomenon is under exploration. In this paper, we first analyze that the limited and predefined learning target in label smoothing results in the smaller maximal probability and logit, which further leads to worse OOD detection performance. To mitigate this issue, we then propose a novel regularization method, called adaptive label smoothing (ALS), and the core is to push the non-true classes to have same probabilities whereas the maximal probability is neither fixed nor limited. Extensive experimental results in six datasets with two backbones suggest that ALS contributes to classifying known samples and discerning unknown samples with clear margins. Our code will be available to the public.
