Making Better Use of Unlabelled Data in Bayesian Active Learning

Freddie Bickford Smith; Adam Foster; Tom Rainforth

Making Better Use of Unlabelled Data in Bayesian Active Learning

Freddie Bickford Smith, Adam Foster, Tom Rainforth

TL;DR

This work proposes a simple framework for semi-supervised Bayesian active learning that produces better-performing models than either conventional Bayesian active learning or semi-supervised learning with randomly acquired data and is easier to scale up than the conventional approach.

Abstract

Fully supervised models are predominant in Bayesian active learning. We argue that their neglect of the information present in unlabelled data harms not just predictive performance but also decisions about what data to acquire. Our proposed solution is a simple framework for semi-supervised Bayesian active learning. We find it produces better-performing models than either conventional Bayesian active learning or semi-supervised learning with randomly acquired data. It is also easier to scale up than the conventional approach. As well as supporting a shift towards semi-supervised models, our findings highlight the importance of studying models and acquisition methods in conjunction.

Making Better Use of Unlabelled Data in Bayesian Active Learning

TL;DR

Abstract

Paper Structure (30 sections, 1 theorem, 5 equations, 8 figures, 2 tables)

This paper contains 30 sections, 1 theorem, 5 equations, 8 figures, 2 tables.

Introduction
Background
Problems with current models
A semi-supervised approach to Bayesian active learning
Semi-supervised model
Fixing the encoder
Update consistency
Computational cost
Acquisition function
Related work
Experiments
Semi-supervised models outperform fully supervised ones but BALD is unreliable
EPIG produces a reliable boost over both BALD and random acquisition
The semi-supervised approach works with messy data and at scale
Semi-supervised models allow much faster Bayesian active learning
...and 15 more sections

Key Result

Proposition 1

$\mathrm{EPIG}(x) \leq \mathrm{BALD}(x)$, with equality if $y_*$ is a one-to-one function of $\theta$ for all $x_*$.

Figures (8)

Figure 1: Semi-supervised models produce much better predictions than fully supervised ones. At the same time, using the right acquisition method is critical for effective active learning. BALD, which targets direct reductions in parameter uncertainty, does not consistently outperform random acquisition here; EPIG, a prediction-oriented alternative, does.
Figure 2: Bayesian deep learning can struggle to appropriately decompose a model's estimated uncertainty. Most of the uncertainty of the model here is ultimately reducible, as demonstrated by the decrease in total uncertainty between training on a small amount of data ($N=10$) and training on more data ($N=10,000$). This suggests the irreducible-uncertainty estimates produced by these methods are inconsistent. Using deep ensembles, Monte Carlo dropout and mean-field variational inference, much of the uncertainty deemed to be irreducible at $N=10$ is resolved at $N=10,000$. Using Laplace approximation, the irreducible uncertainty increases on many of the test inputs after training on more data. See \ref{['sec:experiment_details']} for details.
Figure 3: EPIG's boost over random acquisition is relatively robust to the choice of encoder and prediction head in semi-supervised models applied to MNIST data. BALD only convincingly beats random for two out of four configurations.
Figure 4: BALD and EPIG provide especially big gains in predictive performance relative to random acquisition when the pool of unlabelled data is not carefully curated.
Figure 5: Semi-supervised Bayesian active learning performs strongly on ImageNet data, with EPIG producing a particularly notable boost over random acquisition.
...and 3 more figures

Theorems & Definitions (1)

Proposition 1

Making Better Use of Unlabelled Data in Bayesian Active Learning

TL;DR

Abstract

Making Better Use of Unlabelled Data in Bayesian Active Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (1)