Adaptive Batch Sizes for Active Learning A Probabilistic Numerics Approach

Masaki Adachi; Satoshi Hayakawa; Martin Jørgensen; Xingchen Wan; Vu Nguyen; Harald Oberhauser; Michael A. Osborne

Adaptive Batch Sizes for Active Learning A Probabilistic Numerics Approach

Masaki Adachi, Satoshi Hayakawa, Martin Jørgensen, Xingchen Wan, Vu Nguyen, Harald Oberhauser, Michael A. Osborne

TL;DR

This work proposes a novel Probabilistic Numerics framework that adaptively changes batch sizes, framing batch selection as a quadrature task and demonstrates that this approach significantly enhances learning efficiency and flexibility in diverse Bayesian batch active learning and Bayesian optimization applications.

Abstract

Active learning parallelization is widely used, but typically relies on fixing the batch size throughout experimentation. This fixed approach is inefficient because of a dynamic trade-off between cost and speed -- larger batches are more costly, smaller batches lead to slower wall-clock run-times -- and the trade-off may change over the run (larger batches are often preferable earlier). To address this trade-off, we propose a novel Probabilistic Numerics framework that adaptively changes batch sizes. By framing batch selection as a quadrature task, our integration-error-aware algorithm facilitates the automatic tuning of batch sizes to meet predefined quadrature precision objectives, akin to how typical optimizers terminate based on convergence thresholds. This approach obviates the necessity for exhaustive searches across all potential batch sizes. We also extend this to scenarios with constrained active learning and constrained optimization, interpreting constraint violations as reductions in the precision requirement, to subsequently adapt batch construction. Through extensive experiments, we demonstrate that our approach significantly enhances learning efficiency and flexibility in diverse Bayesian batch active learning and Bayesian optimization applications.

Adaptive Batch Sizes for Active Learning A Probabilistic Numerics Approach

TL;DR

Abstract

Paper Structure (87 sections, 1 theorem, 45 equations, 6 figures, 3 tables)

This paper contains 87 sections, 1 theorem, 45 equations, 6 figures, 3 tables.

Introduction
Contributions
Background
Adaptive Batch Active Learning
Problem Setting of Batch Active Learning
Problem Setting of Kernel Quadrature
Kernel Quadrature via Nyström Approximation
Linear Programming Formulation
Adaptive Batch Sizes
Unknown Constraints As The Lowered Precision Requirement
Error Bounds
How to Solve The LP Problem
Related Work
Batch Active Learning and Optimization
Kernel Quadrature
...and 72 more sections

Key Result

Proposition 1

Under the above setting, let $\textbf{w}_*$ be the optimal solution of the LP, and let $\textbf{X}_\text{batch}$ be the subset of $\textbf{X}_\text{cand}$, corresponding to the nonzero entries of $\textbf{w}_*$ (denoted by $\textbf{w}_\text{batch}$). Suppose that $\tilde{\textbf{X}}_\text{batch}$ and, for any function $f$ in the RKHS with kernel $K$, where $\lVert f \rVert$ is the RKHS norm of

Figures (6)

Figure 1: We fix the quadrature precision instead of batch size. The batch size changes adaptively to meet the predefined precision requirement. Our method, AdaBatAL, efficiently determines the optimal number of batch sizes and their querying positions without requiring a brute-force search of all possible batch sizes. AdaBatAL also offers adaptive batch sizes for constrained active learning and constrained Bayesian optimization.
Figure 2: Constrained batch active learning. As the increased violation risk $\epsilon_\text{vio}$ propagates to the tolerance $\epsilon_\text{LP}$, reward maximization is subsequently prioritized over quadrature, resulting in safe batch samples.
Figure 3: Batch Bayesian optimization results on Hartmann ($d=6$): (a) convergence plot with ($n \leq 5$). (b) batch size variability ($n \leq 100$). The tolerance is set ($\epsilon = 10^{-1}, 10^{-2}, 10^{-3}, 10^{-4}$). (c) Total queries vs. simple regret at the last iteration results of (a)(b). For fixed batch size methods, the mean batch size of AdaBatAL is used ($n = 5, 30, 50, 73, 90$). The plot shows mean $\pm$ standard error of the mean.
Figure 4: Tolerance effect on constrained batch BO on Branin ($d=2$): the balance between (a) violation rate and expected reward, and (b) worst-case error and log determinant. (c) Tolerance adaptively controls violation rate, and (d) outperforms the fixed cases. (a)(b)(c) are the two Y-axis plots where the color and arrow indicate which Y axis to see.
Figure 5: Convergence plot of both constrained batch active learning and Bayesian optimization results across 5 synthetic functions and 7 real-world tasks . $d$ is the dimension, $c$ is the number of unknown constraints. Negative log marginal likelihood (NLML) for active learning tasks, log regret or log best observations for optimization task. Lines and shaded area denote mean ± 1 standard error.
...and 1 more figures

Theorems & Definitions (2)

Proposition 1
proof : Proof of Proposition \ref{['prop:lp']}

Adaptive Batch Sizes for Active Learning A Probabilistic Numerics Approach

TL;DR

Abstract

Adaptive Batch Sizes for Active Learning A Probabilistic Numerics Approach

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (2)