Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference
Yarin Gal, Zoubin Ghahramani
TL;DR
The paper tackles CNN overfitting in data-scarce regimes by formulating a Bayesian CNN that places a kernel-level Bernoulli variational distribution, implemented via dropout. By interpreting dropout as approximate variational inference and employing Monte Carlo dropout, it achieves predictive uncertainty and robust regularization without added parameters. Empirical results on MNIST and CIFAR-10 show improved generalization and, in some architectures, state-of-the-art CIFAR-10 performance, while examining convergence and test-time trade-offs. The work connects dropout to Gaussian processes and provides a practical, low-overhead approach to Bayesian CNNs, with guidance on when MC dropout is beneficial versus standard dropout.
Abstract
Convolutional neural networks (CNNs) work well on large datasets. But labelled data is hard to collect, and in some applications larger amounts of data are not available. The problem then is how to use CNNs with small data -- as CNNs overfit quickly. We present an efficient Bayesian CNN, offering better robustness to over-fitting on small data than traditional approaches. This is by placing a probability distribution over the CNN's kernels. We approximate our model's intractable posterior with Bernoulli variational distributions, requiring no additional model parameters. On the theoretical side, we cast dropout network training as approximate inference in Bayesian neural networks. This allows us to implement our model using existing tools in deep learning with no increase in time complexity, while highlighting a negative result in the field. We show a considerable improvement in classification accuracy compared to standard techniques and improve on published state-of-the-art results for CIFAR-10.
