Table of Contents
Fetching ...

On Thompson Sampling and Bilateral Uncertainty in Additive Bayesian Optimization

Nathan Wycoff

TL;DR

This work advances additive Bayesian optimization by enabling efficient Thompson Sampling that accounts for bilateral uncertainty BU in additive Gaussian processes. Through a careful analysis of joint and marginal posterior structures, it introduces conditional independence based residuals to perform joint TS without Fourier feature approximations. Empirical results across synthetic, game-like, and cosmology-inspired tasks indicate that accounting for BU yields limited practical gains, supporting the viability of BU-ignoring approaches in non-asymptotic settings. The findings suggest focusing research on scalability and robustness of additive BO rather than extensive modeling of cross-dimension uncertainty in typical budgets.

Abstract

In Bayesian Optimization (BO), additive assumptions can mitigate the twin difficulties of modeling and searching a complex function in high dimension. However, common acquisition functions, like the Additive Lower Confidence Bound, ignore pairwise covariances between dimensions, which we'll call \textit{bilateral uncertainty} (BU), imposing a second layer of approximations. While theoretical results indicate that asymptotically not much is lost in doing so, little is known about the practical effects of this assumption in small budgets. In this article, we show that by exploiting conditional independence, Thompson Sampling respecting BU can be efficiently conducted. We use this fact to execute an empirical investigation into the loss incurred by ignoring BU, finding that the additive approximation to Thompson Sampling does indeed have, on balance, worse performance than the exact method, but that this difference is of little practical significance. This buttresses the theoretical understanding and suggests that the BU-ignoring approximation is sufficient for BO in practice, even in the non-asymptotic regime.

On Thompson Sampling and Bilateral Uncertainty in Additive Bayesian Optimization

TL;DR

This work advances additive Bayesian optimization by enabling efficient Thompson Sampling that accounts for bilateral uncertainty BU in additive Gaussian processes. Through a careful analysis of joint and marginal posterior structures, it introduces conditional independence based residuals to perform joint TS without Fourier feature approximations. Empirical results across synthetic, game-like, and cosmology-inspired tasks indicate that accounting for BU yields limited practical gains, supporting the viability of BU-ignoring approaches in non-asymptotic settings. The findings suggest focusing research on scalability and robustness of additive BO rather than extensive modeling of cross-dimension uncertainty in typical budgets.

Abstract

In Bayesian Optimization (BO), additive assumptions can mitigate the twin difficulties of modeling and searching a complex function in high dimension. However, common acquisition functions, like the Additive Lower Confidence Bound, ignore pairwise covariances between dimensions, which we'll call \textit{bilateral uncertainty} (BU), imposing a second layer of approximations. While theoretical results indicate that asymptotically not much is lost in doing so, little is known about the practical effects of this assumption in small budgets. In this article, we show that by exploiting conditional independence, Thompson Sampling respecting BU can be efficiently conducted. We use this fact to execute an empirical investigation into the loss incurred by ignoring BU, finding that the additive approximation to Thompson Sampling does indeed have, on balance, worse performance than the exact method, but that this difference is of little practical significance. This buttresses the theoretical understanding and suggests that the BU-ignoring approximation is sufficient for BO in practice, even in the non-asymptotic regime.

Paper Structure

This paper contains 26 sections, 1 theorem, 19 equations, 12 figures, 3 tables, 2 algorithms.

Key Result

Proposition 3.1

For any $\mathbf{Z}^i\in\mathbb{R}^{B_i\times d_i}$ and $\mathbf{Z}_j\in\mathbb{R}^{B_j\times d_j}$, we have that $f_i(\mathbf{Z}_i) \perp\!\!\!\perp f_j(\mathbf{Z}_j) | \mathbf{y} - f_j(\mathcal{P}_j \mathbf{X})$.

Figures (12)

  • Figure 1: Illustration of Additive Thompson Sampling. Top left panel gives true response surface; top right gives posterior mean of an additive GP. The middle row gives the predictive variance of an additive GP (left) and its approximation without bilateral terms (right). The bottom row gives a predictive draw from each.
  • Figure 2: Example joint distribution in 2D. Left shows posterior conditional, right shows conditional on residuals.
  • Figure 3: Results for Random Additive Ackley functions. Solid line gives median performance and dotted lines give an asymptotic 95% confidence interval thereon. These results are aggregated over 100 replicates.
  • Figure 4: Results for Random Additive levy functions. Solid line gives median performance and dotted lines give an asymptotic 95% confidence interval thereon. These results are aggregated over 100 replicates.
  • Figure 5: Results for Random Additive rastrigin functions. Solid line gives median performance and dotted lines give an asymptotic 95% confidence interval thereon. These results are aggregated over 100 replicates.
  • ...and 7 more figures

Theorems & Definitions (1)

  • Proposition 3.1