Table of Contents
Fetching ...

Wasserstein-enabled characterization of designs and myopic decisions in Bayesian Optimization

Antonio Candelieri, Francesco Archetti

TL;DR

This work addresses the gap between Bayesian Optimization theory and practical deployments by introducing a Wasserstein-distance based, model-free characterization of the current design via two distributional metrics: $\\mathcal{S}_1(\\mathcal{D})=\\mathcal{W}_2^2(X,G_\\mathcal{X})$ and $\\mathcal{S}_2(\\mathcal{D})=\\mathcal{W}_2^2(Y,\\delta_{y^+})$. By linking these metrics to the quality of myopic decisions, the authors show how coverage of the search space and concentration of observed values relate to mispecification risk and potential improvements, independent of a specific surrogate model. Empirical results reveal that the simple SR acquisition is robust across designs, and that the proposed metrics correlate with RMSE and immediate improvement, suggesting a path toward adaptive acquisition strategies that switch based on distributional properties. The work points toward a new generation of acquisition functions that explicitly leverage $\\mathcal{S}_1(\\mathcal{D})$ and $\\mathcal{S}_2(\\mathcal{D})$ to improve robust BO performance in expensive, real-world tasks.

Abstract

Impractical assumptions, an inherently myopic nature, and the crucial role of the initial design, all together contribute to making theoretical convergence proofs of little value in real-life Bayesian Optimization applications. In this paper, we propose a novel characterization of the design depending on its distributional properties, separately measured with respect to the coverage of the search space and the concentration around the best observed function value. These measures are based on the Wasserstein distance and enable a model-free evaluation of the information value of the design before deciding the next query. Then, embracing the myopic nature of Bayesian Optimization, we take an empirical approach to analyze the relation between the proposed characterization of the design and the quality of the next query. Ultimately, we provide important and useful insights that might inspire the definition of a new generation of acquisition functions in Bayesian Optimization.

Wasserstein-enabled characterization of designs and myopic decisions in Bayesian Optimization

TL;DR

This work addresses the gap between Bayesian Optimization theory and practical deployments by introducing a Wasserstein-distance based, model-free characterization of the current design via two distributional metrics: and . By linking these metrics to the quality of myopic decisions, the authors show how coverage of the search space and concentration of observed values relate to mispecification risk and potential improvements, independent of a specific surrogate model. Empirical results reveal that the simple SR acquisition is robust across designs, and that the proposed metrics correlate with RMSE and immediate improvement, suggesting a path toward adaptive acquisition strategies that switch based on distributional properties. The work points toward a new generation of acquisition functions that explicitly leverage and to improve robust BO performance in expensive, real-world tasks.

Abstract

Impractical assumptions, an inherently myopic nature, and the crucial role of the initial design, all together contribute to making theoretical convergence proofs of little value in real-life Bayesian Optimization applications. In this paper, we propose a novel characterization of the design depending on its distributional properties, separately measured with respect to the coverage of the search space and the concentration around the best observed function value. These measures are based on the Wasserstein distance and enable a model-free evaluation of the information value of the design before deciding the next query. Then, embracing the myopic nature of Bayesian Optimization, we take an empirical approach to analyze the relation between the proposed characterization of the design and the quality of the next query. Ultimately, we provide important and useful insights that might inspire the definition of a new generation of acquisition functions in Bayesian Optimization.
Paper Structure (20 sections, 8 equations, 9 figures, 3 tables)

This paper contains 20 sections, 8 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: A conceptual representation of how a design $\mathcal{D}$ is characterized by its distributional properties $\mathcal{S}_1(\mathcal{D})\stackrel{\text{def}}{=}\mathcal{W}_2^2(X,G_\mathcal{X})$ and $\mathcal{S}_2(\mathcal{D})\stackrel{\text{def}}{=}\mathcal{W}_2^2(Y,\delta_{y^+})$.
  • Figure 2: Thirty designs, with size 5, 12, and 20, on eight 1-dimensional test problems, separately for three types of design: LHS, LHS+$N(x^*)$, and LHS+$N(\tilde{x})$.
  • Figure 3: Thirty designs, with size 10, 25, and 40, on six 2-dimensional test problems, separately for three types of design: LHS, LHS+$N(x^*)$, and LHS+$N(\tilde{x})$.
  • Figure 4: Boxplot of the $\Delta_y$ values on the 1-dimensional problems: one chart for each type of design. Boxes refer to acquisition functions (i.e., SR: surface response, SD: maximization of the GP's standard deviation, EI: Expected Improvement, and LCB: Lower Confidence Bound). All the acquisition functions share the same GP model.
  • Figure 5: Boxplot of the $\Delta_y$ values on the 2-dimensional problems: one chart for each type of design. Boxes refer to acquisition functions (i.e., SR: surface response, SD: maximization of the GP's standard deviation, EI: Expected Improvement, and LCB: Lower Confidence Bound). All the acquisition functions share the same GP model.
  • ...and 4 more figures