Table of Contents
Fetching ...

The Best of Both Worlds: On the Dilemma of Out-of-distribution Detection

Qingyang Zhang, Qiuxuan Feng, Joey Tianyi Zhou, Yatao Bian, Qinghua Hu, Changqing Zhang

Abstract

Out-of-distribution (OOD) detection is essential for model trustworthiness which aims to sensitively identify semantic OOD samples and robustly generalize for covariate-shifted OOD samples. However, we discover that the superior OOD detection performance of state-of-the-art methods is achieved by secretly sacrificing the OOD generalization ability. Specifically, the classification accuracy of these models could deteriorate dramatically when they encounter even minor noise. This phenomenon contradicts the goal of model trustworthiness and severely restricts their applicability in real-world scenarios. What is the hidden reason behind such a limitation? In this work, we theoretically demystify the ``\textit{sensitive-robust}'' dilemma that lies in many existing OOD detection methods. Consequently, a theory-inspired algorithm is induced to overcome such a dilemma. By decoupling the uncertainty learning objective from a Bayesian perspective, the conflict between OOD detection and OOD generalization is naturally harmonized and a dual-optimal performance could be expected. Empirical studies show that our method achieves superior performance on standard benchmarks. To our best knowledge, this work is the first principled OOD detection method that achieves state-of-the-art OOD detection performance without compromising OOD generalization ability. Our code is available at \href{https://github.com/QingyangZhang/DUL}{https://github.com/QingyangZhang/DUL}.

The Best of Both Worlds: On the Dilemma of Out-of-distribution Detection

Abstract

Out-of-distribution (OOD) detection is essential for model trustworthiness which aims to sensitively identify semantic OOD samples and robustly generalize for covariate-shifted OOD samples. However, we discover that the superior OOD detection performance of state-of-the-art methods is achieved by secretly sacrificing the OOD generalization ability. Specifically, the classification accuracy of these models could deteriorate dramatically when they encounter even minor noise. This phenomenon contradicts the goal of model trustworthiness and severely restricts their applicability in real-world scenarios. What is the hidden reason behind such a limitation? In this work, we theoretically demystify the ``\textit{sensitive-robust}'' dilemma that lies in many existing OOD detection methods. Consequently, a theory-inspired algorithm is induced to overcome such a dilemma. By decoupling the uncertainty learning objective from a Bayesian perspective, the conflict between OOD detection and OOD generalization is naturally harmonized and a dual-optimal performance could be expected. Empirical studies show that our method achieves superior performance on standard benchmarks. To our best knowledge, this work is the first principled OOD detection method that achieves state-of-the-art OOD detection performance without compromising OOD generalization ability. Our code is available at \href{https://github.com/QingyangZhang/DUL}{https://github.com/QingyangZhang/DUL}.

Paper Structure

This paper contains 26 sections, 3 theorems, 29 equations, 4 figures, 14 tables, 1 algorithm.

Key Result

Theorem 1

Let $\mathcal{P^{\rm COV}}$, $P^{\rm SEM}_{\rm test}$ be the covariate-shifted OOD and semantic OOD distribution. ${\rm GError}_{P^{\rm COV}}(f)$ denotes standard cross entropy loss taking expectation on $P^{\rm COV}$, i.e., generalization error. Then we have where $\mathcal{L}_{\rm reg}$ is the OOD detection loss devised for MSP detectors defined in hendrycks2018deep, i.e., cross-entropy between

Figures (4)

  • Figure 1: (a): Models trained on in-distribution (ID) data inevitably encounter distributional shifts during their deployment. OOD generalization expects the model to correctly classify covariate-shifted data that undergoes noise or corruption due to environmental issues. OOD detection aims to identify samples that do not belong to any known classes for trustworthiness consideration. (b): Limitations of current advanced OOD detection methods. We consider 8 representative OOD detection methods including the baseline method MSP hendrycks2016baseline (without any OOD detection regularization), Entropy hendrycks2018deep, EBM liu2020energy, Bayesian malinin2018predictive, SOTA OOD detection methods WOODS katz2022training, POEM ming2022poem, recent advanced SCONE bai2023feed which aims to seek for a good trade-off and the proposed DUL. All these methods exhibit a degraded generalization ability compared to baseline method MSP and lie in a trade-off area except our DUL. The goal of this paper is to understand and mitigate this phenomenon.
  • Figure 2: Visualization of different types of uncertainty estimated by DUL.
  • Figure 3: Visualization of different types of uncertainty on semantic OOD test dataset (i.e., Textures) when CIFAR-10 is ID dataset. Without DUL (orange), all three types of uncertainty will increase altogether on OOD. In contrast, DUL (green) increases the distributional uncertainty but decreases the data uncertainty on OOD, which further lead to unchanged overall uncertainty.
  • Figure 4: Semantic OOD samples can be very similar to ID.

Theorems & Definitions (10)

  • Definition 1: Disparity with Total Variation Distance
  • Definition 2: Disparity Discrepancy with Total Variation Distance, DD with TVD
  • Theorem 1: Sensitive-robust dilemma
  • Definition 3: Disparity with Total Variation Distance
  • Definition 4: Disparity Discrepancy with Total Variation Distance, DD with TVD
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • proof