Table of Contents
Fetching ...

Optimizing Posterior Samples for Bayesian Optimization via Rootfinding

Taiwo A. Adebiyi, Bach Do, Ruda Zhang

TL;DR

This work tackles the bottleneck in Bayesian optimization where inner-loop optimization of posterior-sample-based acquisition functions is hard, especially in high dimensions. It introduces TS-roots, a global optimization strategy that uses pathwise conditioning and a separable GP prior to select a small, informative set of gradient-start points comprising exploration and exploitation candidates, achieving near-linear scaling in dimension. The authors also present a sample-average posterior function to explicitly balance exploration and exploitation, and they demonstrate substantial improvements in both inner-loop optimization for GP-TS and outer-loop performance on benchmark problems and a real-world ten-bar truss design. The approach leverages spectral representations and univariate global rootfinding to efficiently characterize prior minima and propagate that structure into the multivariate setting, providing a robust, scalable alternative to random-start or population-based methods. The work includes open-source code and shows that TS-roots enhances information-theoretic acquisitions like MES, suggesting broad practical impact for accelerating BO in challenging, high-dimensional tasks.

Abstract

Bayesian optimization devolves the global optimization of a costly objective function to the global optimization of a sequence of acquisition functions. This inner-loop optimization can be catastrophically difficult if it involves posterior sample paths, especially in higher dimensions. We introduce an efficient global optimization strategy for posterior samples based on global rootfinding. It provides gradient-based optimizers with two sets of judiciously selected starting points, designed to combine exploration and exploitation. The number of starting points can be kept small without sacrificing optimization quality. Remarkably, even with just one point from each set, the global optimum is discovered most of the time. The algorithm scales practically linearly to high dimensions, breaking the curse of dimensionality. For Gaussian process Thompson sampling (GP-TS), we demonstrate remarkable improvement in both inner- and outer-loop optimization, surprisingly outperforming alternatives like EI and GP-UCB in most cases. Our approach also improves the performance of other posterior sample-based acquisition functions, such as variants of entropy search. Furthermore, we propose a sample-average formulation of GP-TS, which has a parameter to explicitly control exploitation and can be computed at the cost of one posterior sample. Our implementation is available at https://github.com/UQUH/TSRoots .

Optimizing Posterior Samples for Bayesian Optimization via Rootfinding

TL;DR

This work tackles the bottleneck in Bayesian optimization where inner-loop optimization of posterior-sample-based acquisition functions is hard, especially in high dimensions. It introduces TS-roots, a global optimization strategy that uses pathwise conditioning and a separable GP prior to select a small, informative set of gradient-start points comprising exploration and exploitation candidates, achieving near-linear scaling in dimension. The authors also present a sample-average posterior function to explicitly balance exploration and exploitation, and they demonstrate substantial improvements in both inner-loop optimization for GP-TS and outer-loop performance on benchmark problems and a real-world ten-bar truss design. The approach leverages spectral representations and univariate global rootfinding to efficiently characterize prior minima and propagate that structure into the multivariate setting, providing a robust, scalable alternative to random-start or population-based methods. The work includes open-source code and shows that TS-roots enhances information-theoretic acquisitions like MES, suggesting broad practical impact for accelerating BO in challenging, high-dimensional tasks.

Abstract

Bayesian optimization devolves the global optimization of a costly objective function to the global optimization of a sequence of acquisition functions. This inner-loop optimization can be catastrophically difficult if it involves posterior sample paths, especially in higher dimensions. We introduce an efficient global optimization strategy for posterior samples based on global rootfinding. It provides gradient-based optimizers with two sets of judiciously selected starting points, designed to combine exploration and exploitation. The number of starting points can be kept small without sacrificing optimization quality. Remarkably, even with just one point from each set, the global optimum is discovered most of the time. The algorithm scales practically linearly to high dimensions, breaking the curse of dimensionality. For Gaussian process Thompson sampling (GP-TS), we demonstrate remarkable improvement in both inner- and outer-loop optimization, surprisingly outperforming alternatives like EI and GP-UCB in most cases. Our approach also improves the performance of other posterior sample-based acquisition functions, such as variants of entropy search. Furthermore, we propose a sample-average formulation of GP-TS, which has a parameter to explicitly control exploitation and can be computed at the cost of one posterior sample. Our implementation is available at https://github.com/UQUH/TSRoots .

Paper Structure

This paper contains 56 sections, 2 theorems, 29 equations, 14 figures, 7 algorithms.

Key Result

Proposition 1

The set of strong local minima of the prior sample $f_{\omega}(\mathbf{x})$ can be written as: where tensor grids $\Xi^{(j)} = \prod_{i=1}^d \Xi_i^{(j)}$, $j \in \{0, 1\}$. The set $\widehat{X}_{\omega}$ of strong local maxima of $f_{\omega}(\mathbf{x})$ has an analogous representation, and satisfies $\widehat{X}_{\omega} \sqcup \breve{X}_{\omega} = \Xi^{(0)} \sqcup \Xi^{(1)}$, where $\sqcup$

Figures (14)

  • Figure 1: Illustrations of exploration and exploitation sets for the global optimization of GP-TS acquisition functions in one dimension (top row) and two dimensions (bottom row). Left column: When the global minimum $\widetilde{\mathbf{x}}_{\widetilde{\omega}}^\star$ of the GP-TS acquisition function lies outside the interpolation region, it is typically identified by starting the gradient-based multistart optimizer at a local minimum of the prior sample. Right column: When $\widetilde{\mathbf{x}}_{\widetilde{\omega}}^\star$ is within the interpolation region, it can be found by starting the optimizer at either an observed location or a local minimum of the prior sample.
  • Figure 2: Outer-loop optimization results for the (a) 2D Schwefel, (b) 4D Rosenbrock, (c) 10D Levy, (d) 16D Ackley, and (e) 16D Powell functions. The plots are histories of medians and interquartile ranges of solution values from 20 runs of TS-roots, TS-DSRF (i.e., TS using decoupled sampling with random Fourier features), TS-RF (i.e., TS using random Fourier features), EI, and LCB.
  • Figure 3: Inner-loop optimization results by rootfinding, a gradient-based multistart optimizer with random starting points (random multistart), and a genetic algorithm for (a) the 2D Schwefel and (b) 4D Rosenbrock functions. The plots are cumulative values of optimized GP-TS acquisition functions $\alpha_k^\star$, cumulative distances between new solution points ${\bf x}_k^\star$ and the true global minima ${\bf x}_k^\text{t}$ of the acquisition functions, and cumulative CPU times $t_k$ for optimizing the acquisition functions.
  • Figure 4: Inner-loop optimization results by rootfinding, a gradient-based multistart optimizer with random starting points (random multistart), and a genetic algorithm for (a) the 10D Levy, (b) 16D Ackley, and (c) 16D Powell functions. The plots are cumulative values of optimized GP-TS acquisition functions $\alpha_k^\star$ and cumulative CPU times $t_k$ for optimizing the acquisition functions.
  • Figure 5: Performance of MES-R 10 and MES-R 50 for (a) the 4D Rosenbrock function, (b) the 6D Hartmann function, and (c) the 10D Levy function when TS-RF and TS-roots are used for generating random samples from $f^\star|\mathcal{D}$. The plots are histories of medians and interquartile ranges of solutions from ten runs of each method.
  • ...and 9 more figures

Theorems & Definitions (2)

  • Proposition 1
  • Theorem 1