Table of Contents
Fetching ...

Making Better Use of Unlabelled Data in Bayesian Active Learning

Freddie Bickford Smith, Adam Foster, Tom Rainforth

TL;DR

This work proposes a simple framework for semi-supervised Bayesian active learning that produces better-performing models than either conventional Bayesian active learning or semi-supervised learning with randomly acquired data and is easier to scale up than the conventional approach.

Abstract

Fully supervised models are predominant in Bayesian active learning. We argue that their neglect of the information present in unlabelled data harms not just predictive performance but also decisions about what data to acquire. Our proposed solution is a simple framework for semi-supervised Bayesian active learning. We find it produces better-performing models than either conventional Bayesian active learning or semi-supervised learning with randomly acquired data. It is also easier to scale up than the conventional approach. As well as supporting a shift towards semi-supervised models, our findings highlight the importance of studying models and acquisition methods in conjunction.

Making Better Use of Unlabelled Data in Bayesian Active Learning

TL;DR

This work proposes a simple framework for semi-supervised Bayesian active learning that produces better-performing models than either conventional Bayesian active learning or semi-supervised learning with randomly acquired data and is easier to scale up than the conventional approach.

Abstract

Fully supervised models are predominant in Bayesian active learning. We argue that their neglect of the information present in unlabelled data harms not just predictive performance but also decisions about what data to acquire. Our proposed solution is a simple framework for semi-supervised Bayesian active learning. We find it produces better-performing models than either conventional Bayesian active learning or semi-supervised learning with randomly acquired data. It is also easier to scale up than the conventional approach. As well as supporting a shift towards semi-supervised models, our findings highlight the importance of studying models and acquisition methods in conjunction.
Paper Structure (30 sections, 1 theorem, 5 equations, 8 figures, 2 tables)

This paper contains 30 sections, 1 theorem, 5 equations, 8 figures, 2 tables.

Key Result

Proposition 1

$\mathrm{EPIG}(x) \leq \mathrm{BALD}(x)$, with equality if $y_*$ is a one-to-one function of $\theta$ for all $x_*$.

Figures (8)

  • Figure 1: Semi-supervised models produce much better predictions than fully supervised ones. At the same time, using the right acquisition method is critical for effective active learning. BALD, which targets direct reductions in parameter uncertainty, does not consistently outperform random acquisition here; EPIG, a prediction-oriented alternative, does.
  • Figure 2: Bayesian deep learning can struggle to appropriately decompose a model's estimated uncertainty. Most of the uncertainty of the model here is ultimately reducible, as demonstrated by the decrease in total uncertainty between training on a small amount of data ($N=10$) and training on more data ($N=10,000$). This suggests the irreducible-uncertainty estimates produced by these methods are inconsistent. Using deep ensembles, Monte Carlo dropout and mean-field variational inference, much of the uncertainty deemed to be irreducible at $N=10$ is resolved at $N=10,000$. Using Laplace approximation, the irreducible uncertainty increases on many of the test inputs after training on more data. See \ref{['sec:experiment_details']} for details.
  • Figure 3: EPIG's boost over random acquisition is relatively robust to the choice of encoder and prediction head in semi-supervised models applied to MNIST data. BALD only convincingly beats random for two out of four configurations.
  • Figure 4: BALD and EPIG provide especially big gains in predictive performance relative to random acquisition when the pool of unlabelled data is not carefully curated.
  • Figure 5: Semi-supervised Bayesian active learning performs strongly on ImageNet data, with EPIG producing a particularly notable boost over random acquisition.
  • ...and 3 more figures

Theorems & Definitions (1)

  • Proposition 1