Optimizing Posterior Samples for Bayesian Optimization via Rootfinding
Taiwo A. Adebiyi, Bach Do, Ruda Zhang
TL;DR
This work tackles the bottleneck in Bayesian optimization where inner-loop optimization of posterior-sample-based acquisition functions is hard, especially in high dimensions. It introduces TS-roots, a global optimization strategy that uses pathwise conditioning and a separable GP prior to select a small, informative set of gradient-start points comprising exploration and exploitation candidates, achieving near-linear scaling in dimension. The authors also present a sample-average posterior function to explicitly balance exploration and exploitation, and they demonstrate substantial improvements in both inner-loop optimization for GP-TS and outer-loop performance on benchmark problems and a real-world ten-bar truss design. The approach leverages spectral representations and univariate global rootfinding to efficiently characterize prior minima and propagate that structure into the multivariate setting, providing a robust, scalable alternative to random-start or population-based methods. The work includes open-source code and shows that TS-roots enhances information-theoretic acquisitions like MES, suggesting broad practical impact for accelerating BO in challenging, high-dimensional tasks.
Abstract
Bayesian optimization devolves the global optimization of a costly objective function to the global optimization of a sequence of acquisition functions. This inner-loop optimization can be catastrophically difficult if it involves posterior sample paths, especially in higher dimensions. We introduce an efficient global optimization strategy for posterior samples based on global rootfinding. It provides gradient-based optimizers with two sets of judiciously selected starting points, designed to combine exploration and exploitation. The number of starting points can be kept small without sacrificing optimization quality. Remarkably, even with just one point from each set, the global optimum is discovered most of the time. The algorithm scales practically linearly to high dimensions, breaking the curse of dimensionality. For Gaussian process Thompson sampling (GP-TS), we demonstrate remarkable improvement in both inner- and outer-loop optimization, surprisingly outperforming alternatives like EI and GP-UCB in most cases. Our approach also improves the performance of other posterior sample-based acquisition functions, such as variants of entropy search. Furthermore, we propose a sample-average formulation of GP-TS, which has a parameter to explicitly control exploitation and can be computed at the cost of one posterior sample. Our implementation is available at https://github.com/UQUH/TSRoots .
