Bayesian Optimization for Non-Convex Two-Stage Stochastic Optimization Problems
Jack M. Buckingham, Ivo Couckuyt, Juergen Branke
TL;DR
The paper tackles expensive, black-box two-stage stochastic optimization by introducing a joint knowledge-gradient-based Bayesian optimization (jKG) that simultaneously learns the here-and-now design ${\bm x}$ and a wait-and-see policy ${\bm g}$. It provides a theoretical consistency result showing $\bar f({\mathbf x}^{*n},{\mathbf g}^{*n})$ converges to the optimum as $n\to\infty$, and develops a computationally tractable approximation using discrete inner spaces and quasi-Monte Carlo methods. An alternative alternating KG (aKG) and a two-step KG (2sKG) are proposed for scalability and benchmarking. Empirically, jKG and aKG outperform 2sKG and Sobol/random baselines across synthetic GP landscapes and real-world scenarios (optical table and supply chain), achieving faster convergence and better final costs, while also delivering a concrete policy $\mathbf g^*(\cdot)$ alongside the fixed design. The methods advance sample-efficient optimization for expensive two-stage problems with practical impact in engineering and operations research.
Abstract
Bayesian optimization is a sample-efficient method for solving expensive, black-box optimization problems. Stochastic programming concerns optimization under uncertainty where, typically, average performance is the quantity of interest. In the first stage of a two-stage problem, here-and-now decisions must be made in the face of uncertainty, while in the second stage, wait-and-see decisions are made after the uncertainty has been resolved. Many methods in stochastic programming assume that the objective is cheap to evaluate and linear or convex. We apply Bayesian optimization to solve non-convex, two-stage stochastic programs which are black-box and expensive to evaluate as, for example, is often the case with simulation objectives. We formulate a knowledge-gradient-based acquisition function to jointly optimize the first- and second-stage variables, establish a guarantee of asymptotic consistency, and provide a computationally efficient approximation. We demonstrate comparable empirical results to an alternative we formulate with fewer approximations, which alternates its focus between the two variable types, and superior empirical results over the state of the art and the standard, naïve, two-step benchmark.
