Cost-aware Bayesian Optimization via the Pandora's Box Gittins Index

Qian Xie; Raul Astudillo; Peter I. Frazier; Ziv Scully; Alexander Terenin

Cost-aware Bayesian Optimization via the Pandora's Box Gittins Index

Qian Xie, Raul Astudillo, Peter I. Frazier, Ziv Scully, Alexander Terenin

TL;DR

This work addresses cost-aware Bayesian optimization by integrating a novel connection to Pandora's Box and the Gittins index. The authors derive the Pandora's Box Gittins Index (PBGI) as an acquisition-function class that naturally incorporates evaluation costs under budget constraints and per-sample costs, with extensions to stochastic and unknown costs. They establish both theoretical links—showing Bayesian-optimality in the Pandora's Box setting—and practical algorithms, including computation via bisection and gradient expressions. Empirically, PBGI variants perform competitively, often surpassing baselines in medium-to-high dimensional and multimodal problems, and even improving performance in costless settings, illustrating the potential of combining Gittins-index theory with Bayesian optimization for cost-aware decision making.

Abstract

Bayesian optimization is a technique for efficiently optimizing unknown functions in a black-box manner. To handle practical settings where gathering data requires use of finite resources, it is desirable to explicitly incorporate function evaluation costs into Bayesian optimization policies. To understand how to do so, we develop a previously-unexplored connection between cost-aware Bayesian optimization and the Pandora's Box problem, a decision problem from economics. The Pandora's Box problem admits a Bayesian-optimal solution based on an expression called the Gittins index, which can be reinterpreted as an acquisition function. We study the use of this acquisition function for cost-aware Bayesian optimization, and demonstrate empirically that it performs well, particularly in medium-high dimensions. We further show that this performance carries over to classical Bayesian optimization without explicit evaluation costs. Our work constitutes a first step towards integrating techniques from Gittins index theory into Bayesian optimization.

Cost-aware Bayesian Optimization via the Pandora's Box Gittins Index

TL;DR

Abstract

Paper Structure (35 sections, 12 theorems, 67 equations, 20 figures)

This paper contains 35 sections, 12 theorems, 67 equations, 20 figures.

Introduction
Cost-aware Bayesian optimization
Probabilistic models and acquisition functions
Expected improvement per unit cost
The Pandora's Box Gittins index for Bayesian optimization
The Pandora's Box problem
Optimally solving Pandora's Box
An acquisition function class for cost-aware Bayesian optimization
Computation.
Extension to stochastic and non-automatically-differentiable costs.
Qualitative behavior and comparisons.
Experiments
Bayesian regret
Synthetic benchmarks
Empirical objectives
...and 20 more sections

Key Result

theorem 1

Let $X$ be a finite set, let $f : X \-> \R$ be a finite-mean random function for which $f(x)$ is independent of $f(x')$ for $x \neq x'$, and let $c : X \-> \R_+$, without loss of generality, be deterministic. Then, for the cost-per-sample problem, the policy defined by maximizing the Gittins index a

Figures (20)

Figure 1: An illustration of this work's key idea. We view cost-aware Bayesian optimization as an extension of the Pandora's Box problem, and derive the cost-aware acquisition function ${\alpha^{\f{PBGI}}_{t}}$ by incorporating the posterior into the Bayesian-optimal Pandora's Box acquisition function $\alpha^\star$.
Figure 2: A Bayesian optimization problem with varying costs on which LogEIPC---a numerically stable implementation of EIPC, see \ref{['sec:experiments', 'apdx:log_ei']}---has poor performance, inspired by astudillo2021multi. The domain is $X = [-500,500]$, which we visualize on the subinterval $[-5,5]$. Left: illustration of the non-uniform prior variance, which is given by a Matérn-5/2 kernel scaled by a narrow bump function. Center: the cost function, which is a narrow bump-shaped function. Right: median regret curves and quartiles for LogEIPC and PBGI. Legend refers only to regret curves.
Figure 3: Left: contour plots showing how EI (left) and PBGI (center-left, center-right) depend on the posterior mean and standard deviation at a given point (lighter colors indicate higher values). We see that PBGI values high standard deviation more than EI. Right: PBGI performance across values of $\lambda$, under the setup of the Bayesian regret experiment of \ref{['sec:experiments']} with $d=8$. We plot the median of a set of samples using $n=256$ random seeds, along with quartiles to show variability. We see that large $\lambda$-values decrease regret sooner, but eventually lose out to smaller $\lambda$-values.
Figure 4: Regret curves for objective functions sampled from the prior, shown using medians, as well as quartiles to indicate experiment variability. We see in the cost-aware setting that both PBGI variants usually exhibit comparable performance to LogEIPC and LogEICC, with PBGI-D decisively outperforming other baselines. In the uniform-cost setting, the story is similar for $d = 8$ and $d = 16$: PBGI and PBGI-D perform comparably to the best baselines, which are LogEI and UCB, as well as MSEI for $d=8$. However, for $d = 32$, all methods perform comparably to random search, with PBGI, PBGI-D, LogEI, and UCB having near-identical median performance. See \ref{['fig:standard_errors']} for an alternative visualization using mean and standard error.
Figure 5: Synthetic benchmark regret curves, shown using medians, as well as quartiles to assess variability. All objective functions are defined with dimension $d = 16$. We see in the cost-aware setting that PBGI, PBGI-D, LogEIPC, and LogEICC all perform similarly on the heavily-multimodal Ackley function, matching or outperforming the non-myopic BMSEI baseline. On the Levy and Rosenbrock functions, PBGI-D matches---and for some cost budgets outperforms---all baselines, including the non-myopic BMSEI. Under uniform costs, PBGI performs well on Ackley and Levy, but is outperformed by PBGI-D and most baselines on Rosenbrock. See \ref{['fig:standard_errors']} for an alternative visualization using mean and standard error.
...and 15 more figures

Theorems & Definitions (24)

theorem 1: weitzman1979optimal
theorem 2
proposition 1: Gradient of PBGI
proof
definition 1
lemma 1
proof
lemma 2
proof
lemma 3
...and 14 more

Cost-aware Bayesian Optimization via the Pandora's Box Gittins Index

TL;DR

Abstract

Cost-aware Bayesian Optimization via the Pandora's Box Gittins Index

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (20)

Theorems & Definitions (24)