Enhancing Monte Carlo Dropout Performance for Uncertainty Quantification
Hamzeh Asgharnezhad, Afshar Shamsi, Roohallah Alizadehsani, Arash Mohammadi, Hamid Alinejad-Rokny
TL;DR
This work tackles poorly calibrated uncertainty estimates in Monte Carlo Dropout (MCD) by introducing an uncertainty-aware loss that incorporates predictive entropy ($PE$) and by jointly optimizing model weights and hyperparameters using metaheuristics: Grey Wolf Optimizer ($GWO$), Bayesian Optimization ($BO$), and Particle Swarm Optimisation ($PSO$). The framework is evaluated on synthetic Circles data and real-world datasets (Myocarditis, Cats vs Dogs, Wisconsin) across backbones DenseNet121, ResNet50, and VGG16, achieving about a $2$–$3\%$ improvement in both conventional accuracy and Uncertainty Accuracy ($UAcc$), with substantially better calibration as measured by $ECE$. By optimizing dropout rates and hidden-layer sizes within the loss, the method aligns predictive confidence with correctness, enabling more trustworthy uncertainty quantification. The results suggest meaningful gains in the reliability of DNN uncertainty estimates for safety-critical applications like medical imaging and autonomous systems, with broader implications for deployment under distribution shift and data variability.
Abstract
Knowing the uncertainty associated with the output of a deep neural network is of paramount importance in making trustworthy decisions, particularly in high-stakes fields like medical diagnosis and autonomous systems. Monte Carlo Dropout (MCD) is a widely used method for uncertainty quantification, as it can be easily integrated into various deep architectures. However, conventional MCD often struggles with providing well-calibrated uncertainty estimates. To address this, we introduce innovative frameworks that enhances MCD by integrating different search solutions namely Grey Wolf Optimizer (GWO), Bayesian Optimization (BO), and Particle Swarm Optimization (PSO) as well as an uncertainty-aware loss function, thereby improving the reliability of uncertainty quantification. We conduct comprehensive experiments using different backbones, namely DenseNet121, ResNet50, and VGG16, on various datasets, including Cats vs. Dogs, Myocarditis, Wisconsin, and a synthetic dataset (Circles). Our proposed algorithm outperforms the MCD baseline by 2-3% on average in terms of both conventional accuracy and uncertainty accuracy while achieving significantly better calibration. These results highlight the potential of our approach to enhance the trustworthiness of deep learning models in safety-critical applications.
