Table of Contents
Fetching ...

Out-of-Distribution Detection from Small Training Sets using Bayesian Neural Network Classifiers

Kevin Raina, Tanya Schmah

TL;DR

The paper tackles OOD detection under limited training data by leveraging Bayesian Neural Networks to quantify model uncertainty and by designing post-hoc OOD scores that operate in logit space. It introduces a new logit-space k-NN score, plus a class-conditioned variant, and demonstrates that Bayesian adaptations of deterministic scores outperform their non-Bayesian counterparts on MNIST and CIFAR-10 with small data. The results show that logit-based Bayesian scores, particularly the EL kNN+ variant, provide robust OOD discrimination, while prior expressivity influences the relative performance of predictive entropy versus mutual information. Overall, the work highlights the practical value of uncertainty-aware, prior-informed models for reliable OOD detection in low-data regimes and points to logit-space representations as a rich signal for uncertainty.

Abstract

Out-of-Distribution (OOD) detection is critical to AI reliability and safety, yet in many practical settings, only a limited amount of training data is available. Bayesian Neural Networks (BNNs) are a promising class of model on which to base OOD detection, because they explicitly represent epistemic (i.e. model) uncertainty. In the small training data regime, BNNs are especially valuable because they can incorporate prior model information. We introduce a new family of Bayesian posthoc OOD scores based on expected logit vectors, and compare 5 Bayesian and 4 deterministic posthoc OOD scores. Experiments on MNIST and CIFAR-10 In-Distributions, with 5000 training samples or less, show that the Bayesian methods outperform corresponding deterministic methods.

Out-of-Distribution Detection from Small Training Sets using Bayesian Neural Network Classifiers

TL;DR

The paper tackles OOD detection under limited training data by leveraging Bayesian Neural Networks to quantify model uncertainty and by designing post-hoc OOD scores that operate in logit space. It introduces a new logit-space k-NN score, plus a class-conditioned variant, and demonstrates that Bayesian adaptations of deterministic scores outperform their non-Bayesian counterparts on MNIST and CIFAR-10 with small data. The results show that logit-based Bayesian scores, particularly the EL kNN+ variant, provide robust OOD discrimination, while prior expressivity influences the relative performance of predictive entropy versus mutual information. Overall, the work highlights the practical value of uncertainty-aware, prior-informed models for reliable OOD detection in low-data regimes and points to logit-space representations as a rich signal for uncertainty.

Abstract

Out-of-Distribution (OOD) detection is critical to AI reliability and safety, yet in many practical settings, only a limited amount of training data is available. Bayesian Neural Networks (BNNs) are a promising class of model on which to base OOD detection, because they explicitly represent epistemic (i.e. model) uncertainty. In the small training data regime, BNNs are especially valuable because they can incorporate prior model information. We introduce a new family of Bayesian posthoc OOD scores based on expected logit vectors, and compare 5 Bayesian and 4 deterministic posthoc OOD scores. Experiments on MNIST and CIFAR-10 In-Distributions, with 5000 training samples or less, show that the Bayesian methods outperform corresponding deterministic methods.

Paper Structure

This paper contains 30 sections, 4 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: ROC curves comparing the OOD detection performance for MNIST experiments across four datasets and using three uncertainty estimation methods. Results are shown for a dataset size of 5000.
  • Figure 2: Density plots comparing the logits of MNIST test samples predicted 2 (blue), with remaining classes (black) and FashionMNIST samples (red and green) across different logit dimensions. The x-axis represents logit values for the selected dimension (2), while the y-axis varies across subplots. The two specific FashionMNIST samples used are illustrated below the main figure. Note in particular that the red point (predicted label 2) has a lower likelihood with respect to the multivariate distribution of the blue points (see in particular the logit 0 and 8 directions) than its likelihood with respect to the marginal distribution for logit 2. This illustrates that multivariate likelihood may better distinguish ID and OOD examples than the marginal likelihood of a single logit.
  • Figure 3: Radar plots of false positive rates (FPR) at 90 percent, 95 percent, and 99 percent thresholds for Out-of-Distribution (OOD) detection methods in the informed prior setting.
  • Figure 4: ROC curves comparing OOD detection performance across three dataset sizes (1000, 5000, 10000) for four datasets (FashionMNIST, KuzushijiMNIST, Omniglot, notMNIST). Three uncertainty estimation methods—Predictive Entropy, Softmax Entropy, and Mutual Information—are evaluated. Each plot shows how performance varies with dataset size, with distinct line styles representing different methods. A higher curve signifies better discrimination between In-Distribution and OOD data
  • Figure 5: Density plots showing the score distributions of In-Distribution (MNIST, blue) and Out-of-Distribution (FashionMNIST, red) samples across different kNN-based methods: EL kNN, MLE kNN, EL kNN+, and MLE kNN+. The plots illustrate how the score distributions for each method compare between the two datasets.
  • ...and 1 more figures