Table of Contents
Fetching ...

Variational Entropy Search for Adjusting Expected Improvement

Nuojin Cheng, Stephen Becker

TL;DR

The paper reveals that EI can be viewed as a special case of MES when viewed through variational inference, unifying information-theoretic acquisition functions under the Variational Entropy Search (VES) framework. It introduces ESLB as an ELBO-like lower bound and extends the variational family from exponential to Gamma distributions, giving rise to the VES-Gamma algorithm that balances exploration and exploitation. Empirically, VES-Gamma outperforms EI and MES on standard 2D test functions and on real-world hyperparameter-tuning tasks for XGBoost, indicating improved robustness to multi-modality and over-exploitation. The work suggests future directions include exploring broader variational families (e.g., generalized Gamma) to further enhance Bayesian optimization performance.

Abstract

Bayesian optimization is a widely used technique for optimizing black-box functions, with Expected Improvement (EI) being the most commonly utilized acquisition function in this domain. While EI is often viewed as distinct from other information-theoretic acquisition functions, such as entropy search (ES) and max-value entropy search (MES), our work reveals that EI can be considered a special case of MES when approached through variational inference (VI). In this context, we have developed the Variational Entropy Search (VES) methodology and the VES-Gamma algorithm, which adapts EI by incorporating principles from information-theoretic concepts. The efficacy of VES-Gamma is demonstrated across a variety of test functions and read datasets, highlighting its theoretical and practical utilities in Bayesian optimization scenarios.

Variational Entropy Search for Adjusting Expected Improvement

TL;DR

The paper reveals that EI can be viewed as a special case of MES when viewed through variational inference, unifying information-theoretic acquisition functions under the Variational Entropy Search (VES) framework. It introduces ESLB as an ELBO-like lower bound and extends the variational family from exponential to Gamma distributions, giving rise to the VES-Gamma algorithm that balances exploration and exploitation. Empirically, VES-Gamma outperforms EI and MES on standard 2D test functions and on real-world hyperparameter-tuning tasks for XGBoost, indicating improved robustness to multi-modality and over-exploitation. The work suggests future directions include exploring broader variational families (e.g., generalized Gamma) to further enhance Bayesian optimization performance.

Abstract

Bayesian optimization is a widely used technique for optimizing black-box functions, with Expected Improvement (EI) being the most commonly utilized acquisition function in this domain. While EI is often viewed as distinct from other information-theoretic acquisition functions, such as entropy search (ES) and max-value entropy search (MES), our work reveals that EI can be considered a special case of MES when approached through variational inference (VI). In this context, we have developed the Variational Entropy Search (VES) methodology and the VES-Gamma algorithm, which adapts EI by incorporating principles from information-theoretic concepts. The efficacy of VES-Gamma is demonstrated across a variety of test functions and read datasets, highlighting its theoretical and practical utilities in Bayesian optimization scenarios.
Paper Structure (15 sections, 2 theorems, 19 equations, 5 figures, 1 table, 2 algorithms)

This paper contains 15 sections, 2 theorems, 19 equations, 5 figures, 1 table, 2 algorithms.

Key Result

Theorem 3.1

The MES acquisition function in Equation eqn:mes adheres to the Barber-Agakov (BA) bound as proposed in agakov2004algorithm and can be lower bounded as follows, where $q(y^*\vert\mathcal{D}_t,y_{\bm x})$ is any chosen density function and $\text{KL}(\cdot\|\cdot)$ represents the Kullback-Leibler divergence. The inequality is tight if and only if $\mathbb{E}_{p(y_{\bm x}\vert\mathcal{D}_t)}[\text{

Figures (5)

  • Figure 1: The distribution of $\bm x^*$ and $y^*$ approximated by kernel density estimation conditioned on given data points $\mathcal{D}_t$ and one potential evaluation $y_{\bm x}$ at $\bm x = 1.4$. The black crosses are sampled points from their corresponding distributions, which are drawn by sampling the posterior path $p(f\vert y_{\bm x}, \mathcal{D}_t)$. Entropy search methods are hindered by the lack of explicit formulation of $p(\bm x^*\vert y_{\bm x}, \mathcal{D}_t)$ or $p(y^*\vert y_{\bm x}, \mathcal{D}_t)$.
  • Figure 2: The approximation of $p(y^*\vert y_{\bm x}, \mathcal{D}_t)$ (black) in the right side of Figure \ref{['fig:x_y_star']} using moment matching within exponential distributions (blue) and Gamma distributions (orange) on the domain $[1.71, 4.00]$.
  • Figure 3: Comparison of EI, MES, and our proposed VES-Gamma (see Algorithm \ref{['alg:VES-gamma']}) acquisition functions. The top figure shows the test function (black) and the Gaussian posterior (orange) with current observed points (black crosses). The middle figure displays the values of $k$ (olive) and $\beta$ (pink) calculated in Equation \ref{['eq:k-solve']} and Equation \ref{['eq:beta-solve']}, reflecting the reliance of VES-Gamma on the EI acquisition function. The bottom figure compares the three acquisition functions and their maximizers. The acquisition function values are centered and re-scaled for comparison.
  • Figure 4: Performance comparison of VES, EI, and MES on three test functions, illustrating the mean and 0.1 standard deviations for log regret.
  • Figure 5: Performance of VES, EI, and MES for XGBoost hyper-parameter tuning on the diabetes (left) and iris (right) datasets. The figure presents the mean and 0.1 standard deviation of ten trials with different random initial points.

Theorems & Definitions (4)

  • Theorem 3.1
  • Theorem 3.2
  • proof
  • proof