Optimal Initialization of Batch Bayesian Optimization
Jiuge Ren, David Sweet
TL;DR
This work tackles the efficiency of batch Bayesian optimization in low-budget settings by introducing Minimal Terminal Variance (MTV), an acquisition function that optimizes an initial batch and all subsequent batches. MTV embodies an I-Optimality concept weighted by the probability that a point is optimal, $p_*(x)$, and evaluates the objective via a Monte Carlo approximation of the terminal GP variance $ abla \,\sigma^2(x|x_a)$ with respect to batch arms $x_a$, using fantasized GPs for fast computations. The method relies on three pillars: sampling from $p_*(x)$ with a problem-specific MCMC, minimizing the integral approximation of MTV, and careful initialization of the acquisition function optimizer. Empirical results on standard test functions and reinforcement learning simulators show that MTV outperforms common initialization and batch-design baselines across dimensions and problem types, and it remains compatible with ensemble approaches, offering a practical pathway to more informative experiments in both field and simulation contexts.
Abstract
Field experiments and computer simulations are effective but time-consuming methods of measuring the quality of engineered systems at different settings. To reduce the total time required, experimenters may employ Bayesian optimization, which is parsimonious with measurements, and take measurements of multiple settings simultaneously, in a batch. In practice, experimenters use very few batches, thus, it is imperative that each batch be as informative as possible. Typically, the initial batch in a Batch Bayesian Optimization (BBO) is constructed from a quasi-random sample of settings values. We propose a batch-design acquisition function, Minimal Terminal Variance (MTV), that designs a batch by optimization rather than random sampling. MTV adapts a design criterion function from Design of Experiments, called I-Optimality, which minimizes the variance of the post-evaluation estimates of quality, integrated over the entire space of settings. MTV weights the integral by the probability that a setting is optimal, making it able to design not only an initial batch but all subsequent batches, as well. Applicability to both initialization and subsequent batches is novel among acquisition functions. Numerical experiments on test functions and simulators show that MTV compares favorably to other BBO methods.
