Table of Contents
Fetching ...

Incorporating Unlabelled Data into Bayesian Neural Networks

Mrinank Sharma, Tom Rainforth, Yee Whye Teh, Vincent Fortuin

TL;DR

This work tackles the limitation that conventional Bayesian Neural Networks cannot utilize unlabelled data to improve predictions. It introduces Self-Supervised BNNs (SS-BNNs), which learn a function-space prior from unlabelled data by generating pseudo-labelled contrastive tasks and optimizing a variational bound on the marginal likelihood. The learned priors yield prior predictives that better capture semantic similarity, leading to improved predictive performance, particularly in low-label regimes, and enhanced robustness to out-of-distribution data and active-learning scenarios. The approach provides a principled Bayesian interpretation of contrastive learning and demonstrates that unlabelled data can meaningfully inform function priors, with practical impact for label-efficient learning and uncertainty quantification.

Abstract

Conventional Bayesian Neural Networks (BNNs) are unable to leverage unlabelled data to improve their predictions. To overcome this limitation, we introduce Self-Supervised Bayesian Neural Networks, which use unlabelled data to learn models with suitable prior predictive distributions. This is achieved by leveraging contrastive pretraining techniques and optimising a variational lower bound. We then show that the prior predictive distributions of self-supervised BNNs capture problem semantics better than conventional BNN priors. In turn, our approach offers improved predictive performance over conventional BNNs, especially in low-budget regimes.

Incorporating Unlabelled Data into Bayesian Neural Networks

TL;DR

This work tackles the limitation that conventional Bayesian Neural Networks cannot utilize unlabelled data to improve predictions. It introduces Self-Supervised BNNs (SS-BNNs), which learn a function-space prior from unlabelled data by generating pseudo-labelled contrastive tasks and optimizing a variational bound on the marginal likelihood. The learned priors yield prior predictives that better capture semantic similarity, leading to improved predictive performance, particularly in low-label regimes, and enhanced robustness to out-of-distribution data and active-learning scenarios. The approach provides a principled Bayesian interpretation of contrastive learning and demonstrates that unlabelled data can meaningfully inform function priors, with practical impact for label-efficient learning and uncertainty quantification.

Abstract

Conventional Bayesian Neural Networks (BNNs) are unable to leverage unlabelled data to improve their predictions. To overcome this limitation, we introduce Self-Supervised Bayesian Neural Networks, which use unlabelled data to learn models with suitable prior predictive distributions. This is achieved by leveraging contrastive pretraining techniques and optimising a variational lower bound. We then show that the prior predictive distributions of self-supervised BNNs capture problem semantics better than conventional BNN priors. In turn, our approach offers improved predictive performance over conventional BNNs, especially in low-budget regimes.
Paper Structure (44 sections, 11 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 44 sections, 11 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: Self-Supervised Bayesian Neural Networks. (a) Pre-training in self-supervised BNNs corresponds to unsupervised prior learning. We learn a model with a prior distribution such that augmented images likely have the same label and distinct images likely have different labels under the prior predictive. (b) Self-supervised BNN priors assign higher probabilities to semantically consistent image pairs having the same label compared to semantically inconsistent image pairs. Here, semantically consistent image pairs have the same ground-truth label, and semantically inconsistent image pairs have different ground-truth labels. The plot shows a kernel density estimate of the log-probability that same-class and different-class image pairs are assigned the same label under the prior. (c) Unlike self-supervised prior predictives, conventional BNN prior predictives assign similar probabilities to semantically consistent and semantically inconsistent image pairs having the same label.
  • Figure 2: BNN Probabilistic Models. (a) Probabilistic model for conventional BNNs. (b) Probabilistic model for self-supervised BNNs. We share parameters between different tasks, which allows us to condition on generated self-supervised data. $j$ indexes self-supervised tasks, $i$ indexes datapoints.
  • Figure 3: BNN Prior Predictives. We investigate prior predictives by computing the probability $\rho$ that particular image pairs have the same label under the prior, and examining the distribution of $\rho$ across different sets of image pairs. We consider three sets of differing semantic similarity: (i) augmented images; (ii) images of the same class; and (iii) images of different classes. Left: Conventional BNN prior. Right: Self-supervised BNN learnt prior predictive. The self-supervised learnt prior reflects the semantic similarity of the different image pairs better than the BNN prior, which is reflected in the spread between the different distributions.
  • Figure 4: Low-Budget Active Learning on CIFAR10. We compare (i) a self-supervised BNN, (ii) SimCLR, and (iii) a deep ensemble. For the self-supervised BNN and the ensemble, we acquire points with BALD. We use predictive entropy for SimCLR, which does not provide epistemic uncertainty estimates. Mean and std. shown (3 seeds). The methods that incorporate unlabelled data perform best by far, with our method slightly outperforming SimCLR.
  • Figure : Self-Supervised BNNs
  • ...and 1 more figures