Is MC Dropout Bayesian?
Loic Le Folgoc, Vasileios Baltatzis, Sujal Desai, Anand Devaraj, Sam Ellis, Octavio E. Martinez Manzanera, Arjun Nair, Huaqi Qiu, Julia Schnabel, Ben Glocker
TL;DR
It interrogates whether MC Dropout provides a faithful Bayesian treatment for uncertainty quantification in neural networks, especially in medical imaging. It shows that MC dropout yields a multimodal, delta-Dirac posterior that assigns zero probability to the true model on simple benchmarks, challenging its Bayesian interpretation. The authors introduce a generic variational inference engine with structured normal variational families (sN-VI) and mixtures (sGMM-VI) implemented in PyTorch, designed to overcome mean-field VI limitations. Through Gaussian and RBF regression examples, they demonstrate that MC dropout can be misleading while structured VI provides more faithful posterior approximations, offering a practical no-free-lunch alternative for uncertainty quantification.
Abstract
MC Dropout is a mainstream "free lunch" method in medical imaging for approximate Bayesian computations (ABC). Its appeal is to solve out-of-the-box the daunting task of ABC and uncertainty quantification in Neural Networks (NNs); to fall within the variational inference (VI) framework; and to propose a highly multimodal, faithful predictive posterior. We question the properties of MC Dropout for approximate inference, as in fact MC Dropout changes the Bayesian model; its predictive posterior assigns $0$ probability to the true model on closed-form benchmarks; the multimodality of its predictive posterior is not a property of the true predictive posterior but a design artefact. To address the need for VI on arbitrary models, we share a generic VI engine within the pytorch framework. The code includes a carefully designed implementation of structured (diagonal plus low-rank) multivariate normal variational families, and mixtures thereof. It is intended as a go-to no-free-lunch approach, addressing shortcomings of mean-field VI with an adjustable trade-off between expressivity and computational complexity.
