Wasserstein-enabled characterization of designs and myopic decisions in Bayesian Optimization
Antonio Candelieri, Francesco Archetti
TL;DR
This work addresses the gap between Bayesian Optimization theory and practical deployments by introducing a Wasserstein-distance based, model-free characterization of the current design via two distributional metrics: $\\mathcal{S}_1(\\mathcal{D})=\\mathcal{W}_2^2(X,G_\\mathcal{X})$ and $\\mathcal{S}_2(\\mathcal{D})=\\mathcal{W}_2^2(Y,\\delta_{y^+})$. By linking these metrics to the quality of myopic decisions, the authors show how coverage of the search space and concentration of observed values relate to mispecification risk and potential improvements, independent of a specific surrogate model. Empirical results reveal that the simple SR acquisition is robust across designs, and that the proposed metrics correlate with RMSE and immediate improvement, suggesting a path toward adaptive acquisition strategies that switch based on distributional properties. The work points toward a new generation of acquisition functions that explicitly leverage $\\mathcal{S}_1(\\mathcal{D})$ and $\\mathcal{S}_2(\\mathcal{D})$ to improve robust BO performance in expensive, real-world tasks.
Abstract
Impractical assumptions, an inherently myopic nature, and the crucial role of the initial design, all together contribute to making theoretical convergence proofs of little value in real-life Bayesian Optimization applications. In this paper, we propose a novel characterization of the design depending on its distributional properties, separately measured with respect to the coverage of the search space and the concentration around the best observed function value. These measures are based on the Wasserstein distance and enable a model-free evaluation of the information value of the design before deciding the next query. Then, embracing the myopic nature of Bayesian Optimization, we take an empirical approach to analyze the relation between the proposed characterization of the design and the quality of the next query. Ultimately, we provide important and useful insights that might inspire the definition of a new generation of acquisition functions in Bayesian Optimization.
