Table of Contents
Fetching ...

Probabilistic Trust Intervals for Out of Distribution Detection

Gagandeep Singh, Ishan Mishra, Deepak Mishra

TL;DR

The paper tackles reliable OOD detection for pre-trained neural networks without modifying architecture or relying on OOD data. It introduces probabilistic trust intervals around each weight, enabling weight perturbations that generate sibling networks whose outputs are compared via a novel agreement measure $M$ to detect OOD inputs. By optimizing interval sizes with a loss $\mathcal{L}(D,\sigma)$ that balances ID accuracy and weight variability, the approach maintains performance on in-distribution data while boosting detection of out-of-distribution, corrupted, and adversarial inputs. Empirical results across MNIST, CIFAR-10/100, SVHN, Fashion-MNIST, and CIFAR-10-C show competitive or superior OOD metrics (FPR@TPR, AUPR) with constant memory overhead, highlighting the method's practicality for scalable deployment.

Abstract

The ability of a deep learning network to distinguish between in-distribution (ID) and out-of-distribution (OOD) inputs is crucial for ensuring the reliability and trustworthiness of AI systems. Existing OOD detection methods often involve complex architectural innovations, such as ensemble models, which, while enhancing detection accuracy, significantly increase model complexity and training time. Other methods utilize surrogate samples to simulate OOD inputs, but these may not generalize well across different types of OOD data. In this paper, we propose a straightforward yet novel technique to enhance OOD detection in pre-trained networks without altering its original parameters. Our approach defines probabilistic trust intervals for each network weight, determined using in-distribution data. During inference, additional weight values are sampled, and the resulting disagreements among outputs are utilized for OOD detection. We propose a metric to quantify this disagreement and validate its effectiveness with empirical evidence. Our method significantly outperforms various baseline methods across multiple OOD datasets without requiring actual or surrogate OOD samples. We evaluate our approach on MNIST, Fashion-MNIST, CIFAR-10, CIFAR-100 and CIFAR-10-C (a corruption-augmented version of CIFAR-10), across various neural network architectures (e.g., VGG-16, ResNet-20, DenseNet-100). On the MNIST-FashionMNIST setup, our method achieves a False Positive Rate (FPR) of 12.46\% at 95\% True Positive Rate (TPR), compared to 27.09\% achieved by the best baseline. On adversarial and corrupted datasets such as CIFAR-10-C, our proposed method easily differentiate between clean and noisy inputs. These results demonstrate the robustness of our approach in identifying corrupted and adversarial inputs, all without requiring OOD samples during training.

Probabilistic Trust Intervals for Out of Distribution Detection

TL;DR

The paper tackles reliable OOD detection for pre-trained neural networks without modifying architecture or relying on OOD data. It introduces probabilistic trust intervals around each weight, enabling weight perturbations that generate sibling networks whose outputs are compared via a novel agreement measure to detect OOD inputs. By optimizing interval sizes with a loss that balances ID accuracy and weight variability, the approach maintains performance on in-distribution data while boosting detection of out-of-distribution, corrupted, and adversarial inputs. Empirical results across MNIST, CIFAR-10/100, SVHN, Fashion-MNIST, and CIFAR-10-C show competitive or superior OOD metrics (FPR@TPR, AUPR) with constant memory overhead, highlighting the method's practicality for scalable deployment.

Abstract

The ability of a deep learning network to distinguish between in-distribution (ID) and out-of-distribution (OOD) inputs is crucial for ensuring the reliability and trustworthiness of AI systems. Existing OOD detection methods often involve complex architectural innovations, such as ensemble models, which, while enhancing detection accuracy, significantly increase model complexity and training time. Other methods utilize surrogate samples to simulate OOD inputs, but these may not generalize well across different types of OOD data. In this paper, we propose a straightforward yet novel technique to enhance OOD detection in pre-trained networks without altering its original parameters. Our approach defines probabilistic trust intervals for each network weight, determined using in-distribution data. During inference, additional weight values are sampled, and the resulting disagreements among outputs are utilized for OOD detection. We propose a metric to quantify this disagreement and validate its effectiveness with empirical evidence. Our method significantly outperforms various baseline methods across multiple OOD datasets without requiring actual or surrogate OOD samples. We evaluate our approach on MNIST, Fashion-MNIST, CIFAR-10, CIFAR-100 and CIFAR-10-C (a corruption-augmented version of CIFAR-10), across various neural network architectures (e.g., VGG-16, ResNet-20, DenseNet-100). On the MNIST-FashionMNIST setup, our method achieves a False Positive Rate (FPR) of 12.46\% at 95\% True Positive Rate (TPR), compared to 27.09\% achieved by the best baseline. On adversarial and corrupted datasets such as CIFAR-10-C, our proposed method easily differentiate between clean and noisy inputs. These results demonstrate the robustness of our approach in identifying corrupted and adversarial inputs, all without requiring OOD samples during training.

Paper Structure

This paper contains 11 sections, 6 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Illustration of Probabilistic Trust Intervals for randomly picked weights of neural classifier before (red) and after (green) training. Trust intervals before training are of the same size for each weight and get adjusted to ID data after training to maintain the generalization abilities of the underlying neural network while enabling OOD detection.
  • Figure 2: Distribution of quadratic approximation of error for $\mathbb{C}_{1}$ architecture with MNIST dataset as ID data and Fashion-MNIST as OOD data. It can be seen that for ID samples the values are close to zero, whereas for OOD they are spread over a considerably larger range.
  • Figure 3: The change in test accuracy with updates. In the left plot different values of $\pi_{1}$ is varied while $\pi_{2}$ is fixed to $10^{-6}$ and in the right plot different values of $\pi_{2}$ is taken while $\pi_{1}$ is fixed to 1.
  • Figure 4: Distribution of values of $\log{(M)}$ for the original clean images (in green) and their Gaussian noise corrupted versions (top row, in red) and Speckle noise corrupted versions (bottom row, in red) obtained from CIFAR-10-C dataset. A distinct nature of the plots for clean and noisy samples can be observed.
  • Figure 5: Distribution of $\log{(M)}$ values for ${\mathbb{C}_{1}}$ architecture with MNIST dataset as ID data (solid lines) and Fashion-MNIST as OOD data (dashed lines) for different number of weight samples. It can be seen that the distributions almost overlap with each other for different number of samples taken.
  • ...and 1 more figures