Probabilistic Trust Intervals for Out of Distribution Detection
Gagandeep Singh, Ishan Mishra, Deepak Mishra
TL;DR
The paper tackles reliable OOD detection for pre-trained neural networks without modifying architecture or relying on OOD data. It introduces probabilistic trust intervals around each weight, enabling weight perturbations that generate sibling networks whose outputs are compared via a novel agreement measure $M$ to detect OOD inputs. By optimizing interval sizes with a loss $\mathcal{L}(D,\sigma)$ that balances ID accuracy and weight variability, the approach maintains performance on in-distribution data while boosting detection of out-of-distribution, corrupted, and adversarial inputs. Empirical results across MNIST, CIFAR-10/100, SVHN, Fashion-MNIST, and CIFAR-10-C show competitive or superior OOD metrics (FPR@TPR, AUPR) with constant memory overhead, highlighting the method's practicality for scalable deployment.
Abstract
The ability of a deep learning network to distinguish between in-distribution (ID) and out-of-distribution (OOD) inputs is crucial for ensuring the reliability and trustworthiness of AI systems. Existing OOD detection methods often involve complex architectural innovations, such as ensemble models, which, while enhancing detection accuracy, significantly increase model complexity and training time. Other methods utilize surrogate samples to simulate OOD inputs, but these may not generalize well across different types of OOD data. In this paper, we propose a straightforward yet novel technique to enhance OOD detection in pre-trained networks without altering its original parameters. Our approach defines probabilistic trust intervals for each network weight, determined using in-distribution data. During inference, additional weight values are sampled, and the resulting disagreements among outputs are utilized for OOD detection. We propose a metric to quantify this disagreement and validate its effectiveness with empirical evidence. Our method significantly outperforms various baseline methods across multiple OOD datasets without requiring actual or surrogate OOD samples. We evaluate our approach on MNIST, Fashion-MNIST, CIFAR-10, CIFAR-100 and CIFAR-10-C (a corruption-augmented version of CIFAR-10), across various neural network architectures (e.g., VGG-16, ResNet-20, DenseNet-100). On the MNIST-FashionMNIST setup, our method achieves a False Positive Rate (FPR) of 12.46\% at 95\% True Positive Rate (TPR), compared to 27.09\% achieved by the best baseline. On adversarial and corrupted datasets such as CIFAR-10-C, our proposed method easily differentiate between clean and noisy inputs. These results demonstrate the robustness of our approach in identifying corrupted and adversarial inputs, all without requiring OOD samples during training.
