Table of Contents
Fetching ...

Myopic Bayesian Decision Theory for Batch Active Learning with Partial Batch Label Sampling

Kangping Hu, Stephen Mussmann

TL;DR

This work casts batch active learning as a Bayesian decision problem, derives a myopic acquisition framework that encompasses established criteria (e.g., EPIG, EER) and BAIT, and addresses batching by introducing Partial Batch Label Sampling (ParBaLS). ParBaLS incrementally builds a partial batch using sampled pseudo-labels and aggregates across multiple universes, achieving scalable batch selection with complexity roughly $O(T B m)$ and improving performance on both tabular and image-embedding tasks. Empirical results across ten datasets show ParBaLS-EPIG consistently outperforms baselines, while ParBaLS-MAP provides a faster alternative with competitive gains; BatchBALD and other exact batch methods can be computationally prohibitive at larger batch sizes. The authors provide code at $\texttt{https://github.com/ADDAPT-ML/ParBaLS}$ and highlight the practical impact of principled batching in uncertainty-aware AL for real-world label budgets.

Abstract

Over the past couple of decades, many active learning acquisition functions have been proposed, leaving practitioners with an unclear choice of which to use. Bayesian Decision Theory (BDT) offers a universal principle to guide decision-making. In this work, we derive BDT for (Bayesian) active learning in the myopic framework, where we imagine we only have one more point to label. This derivation leads to effective algorithms such as Expected Error Reduction (EER), Expected Predictive Information Gain (EPIG), and other algorithms that appear in the literature. Furthermore, we show that BAIT (active learning based on V-optimal experimental design) can be derived from BDT and asymptotic approximations. A key challenge of such methods is the difficult scaling to large batch sizes, leading to either computational challenges (BatchBALD) or dramatic performance drops (top-$B$ selection). Here, using a particular formulation of the decision process, we derive Partial Batch Label Sampling (ParBaLS) for the EPIG algorithm. We show experimentally for several datasets that ParBaLS EPIG gives superior performance for a fixed budget and Bayesian Logistic Regression on Neural Embeddings. Our code is available at https://github.com/ADDAPT-ML/ParBaLS.

Myopic Bayesian Decision Theory for Batch Active Learning with Partial Batch Label Sampling

TL;DR

This work casts batch active learning as a Bayesian decision problem, derives a myopic acquisition framework that encompasses established criteria (e.g., EPIG, EER) and BAIT, and addresses batching by introducing Partial Batch Label Sampling (ParBaLS). ParBaLS incrementally builds a partial batch using sampled pseudo-labels and aggregates across multiple universes, achieving scalable batch selection with complexity roughly and improving performance on both tabular and image-embedding tasks. Empirical results across ten datasets show ParBaLS-EPIG consistently outperforms baselines, while ParBaLS-MAP provides a faster alternative with competitive gains; BatchBALD and other exact batch methods can be computationally prohibitive at larger batch sizes. The authors provide code at and highlight the practical impact of principled batching in uncertainty-aware AL for real-world label budgets.

Abstract

Over the past couple of decades, many active learning acquisition functions have been proposed, leaving practitioners with an unclear choice of which to use. Bayesian Decision Theory (BDT) offers a universal principle to guide decision-making. In this work, we derive BDT for (Bayesian) active learning in the myopic framework, where we imagine we only have one more point to label. This derivation leads to effective algorithms such as Expected Error Reduction (EER), Expected Predictive Information Gain (EPIG), and other algorithms that appear in the literature. Furthermore, we show that BAIT (active learning based on V-optimal experimental design) can be derived from BDT and asymptotic approximations. A key challenge of such methods is the difficult scaling to large batch sizes, leading to either computational challenges (BatchBALD) or dramatic performance drops (top- selection). Here, using a particular formulation of the decision process, we derive Partial Batch Label Sampling (ParBaLS) for the EPIG algorithm. We show experimentally for several datasets that ParBaLS EPIG gives superior performance for a fixed budget and Bayesian Logistic Regression on Neural Embeddings. Our code is available at https://github.com/ADDAPT-ML/ParBaLS.

Paper Structure

This paper contains 33 sections, 27 equations, 5 figures, 3 tables, 2 algorithms.

Figures (5)

  • Figure 1: An illustration of the proposed method, ParBaLS. Each square denotes a datapoint: the number is the index, and the color represents the label. A dark color means the sample is labeled (by the labeler or the pseudo-labeler). After the AL system selects the sample(s) for labeling, it sends their indices (e.g. {7} for Single Point AL or {5, 7, 8, 0} for Batch AL) to the labeler, who returns the true labels of them (e.g. {blue} for Single Point AL or {red, blue, red, blue} for Batch AL). In ParBaLS, we focus on Batch AL and reduce it to Single Point AL. In the figure, we have already committed to a partial batch of {5, 7}. While we don't know the true labels, we can run single point AL in alternative universes (Universe 1, 2, and 3) where we've sampled pseudo-labels. Within each universe, we train a model on {1, 2, 3, 5, 7}. We can then average the active learning acquisition scores across universes to, for example, choose 8 as the next point to label. 8 is added to the partial batch, and each universe's model is updated with the universe's pseudo-label for 8. This process continues until the partial batch is complete (with $B$ datapoints), and the AL system will send the indices of the selected batch to the labeler, as shown in Batch AL.
  • Figure 2: Test accuracy on tabular datasets with Bayesian Logistic Regression, where each of the 10 iterations has a labeling budget of 20 samples, except for the first iteration that starts with 100 samples.
  • Figure 3: Test accuracy on one-vs-all image datasets with fixed encoders and trainable Bayesian Logistic Regression layer, where each of the 10 iterations has a labeling budget of 20 samples, except for the first iteration that starts with 100 samples.
  • Figure 4: Test accuracy on subpopulation-shifted image datasets with fixed encoders and trainable Bayesian Logistic Regression layer, where each of the 10 iterations has a labeling budget of 20 samples, except for the first iteration that starts with 100 samples.
  • Figure 5: Test accuracy on tabular datasets with Bayesian Logistic Regression, where each of the 100 iterations has a labeling budget of 1 sample, except for the first iteration that starts with 100 samples.