Extracting Usable Predictions from Quantized Networks through Uncertainty Quantification for OOD Detection
Rishi Singhal, Srinath Srinivasan
TL;DR
This work tackles OOD detection in quantized vision models under resource constraints by introducing an uncertainty quantification pipeline that applies inference-time Monte Carlo dropout to a fine-tuned backbone followed by post-training int8 quantization. It computes per-class confidence intervals using a Gaussian assumption, $ (\mu - Z\sigma, \mu + Z\sigma) $, with a configurable conf_factor to decide predictions and discard uncertain samples. The approach yields usable predictions by filtering out confusing inputs, improves F1 metrics on CIFAR-100 and CIFAR-100C, and achieves substantial model compression (~4x reduction in size) at the expense of increased inference time. This framework is practically significant for safety-critical tasks where reliable decision-making must be balanced against resource limitations.
Abstract
OOD detection has become more pertinent with advances in network design and increased task complexity. Identifying which parts of the data a given network is misclassifying has become as valuable as the network's overall performance. We can compress the model with quantization, but it suffers minor performance loss. The loss of performance further necessitates the need to derive the confidence estimate of the network's predictions. In line with this thinking, we introduce an Uncertainty Quantification(UQ) technique to quantify the uncertainty in the predictions from a pre-trained vision model. We subsequently leverage this information to extract valuable predictions while ignoring the non-confident predictions. We observe that our technique saves up to 80% of ignored samples from being misclassified. The code for the same is available here.
