Table of Contents
Fetching ...

Bayesian Optimization for Non-Convex Two-Stage Stochastic Optimization Problems

Jack M. Buckingham, Ivo Couckuyt, Juergen Branke

TL;DR

The paper tackles expensive, black-box two-stage stochastic optimization by introducing a joint knowledge-gradient-based Bayesian optimization (jKG) that simultaneously learns the here-and-now design ${\bm x}$ and a wait-and-see policy ${\bm g}$. It provides a theoretical consistency result showing $\bar f({\mathbf x}^{*n},{\mathbf g}^{*n})$ converges to the optimum as $n\to\infty$, and develops a computationally tractable approximation using discrete inner spaces and quasi-Monte Carlo methods. An alternative alternating KG (aKG) and a two-step KG (2sKG) are proposed for scalability and benchmarking. Empirically, jKG and aKG outperform 2sKG and Sobol/random baselines across synthetic GP landscapes and real-world scenarios (optical table and supply chain), achieving faster convergence and better final costs, while also delivering a concrete policy $\mathbf g^*(\cdot)$ alongside the fixed design. The methods advance sample-efficient optimization for expensive two-stage problems with practical impact in engineering and operations research.

Abstract

Bayesian optimization is a sample-efficient method for solving expensive, black-box optimization problems. Stochastic programming concerns optimization under uncertainty where, typically, average performance is the quantity of interest. In the first stage of a two-stage problem, here-and-now decisions must be made in the face of uncertainty, while in the second stage, wait-and-see decisions are made after the uncertainty has been resolved. Many methods in stochastic programming assume that the objective is cheap to evaluate and linear or convex. We apply Bayesian optimization to solve non-convex, two-stage stochastic programs which are black-box and expensive to evaluate as, for example, is often the case with simulation objectives. We formulate a knowledge-gradient-based acquisition function to jointly optimize the first- and second-stage variables, establish a guarantee of asymptotic consistency, and provide a computationally efficient approximation. We demonstrate comparable empirical results to an alternative we formulate with fewer approximations, which alternates its focus between the two variable types, and superior empirical results over the state of the art and the standard, naïve, two-step benchmark.

Bayesian Optimization for Non-Convex Two-Stage Stochastic Optimization Problems

TL;DR

The paper tackles expensive, black-box two-stage stochastic optimization by introducing a joint knowledge-gradient-based Bayesian optimization (jKG) that simultaneously learns the here-and-now design and a wait-and-see policy . It provides a theoretical consistency result showing converges to the optimum as , and develops a computationally tractable approximation using discrete inner spaces and quasi-Monte Carlo methods. An alternative alternating KG (aKG) and a two-step KG (2sKG) are proposed for scalability and benchmarking. Empirically, jKG and aKG outperform 2sKG and Sobol/random baselines across synthetic GP landscapes and real-world scenarios (optical table and supply chain), achieving faster convergence and better final costs, while also delivering a concrete policy alongside the fixed design. The methods advance sample-efficient optimization for expensive two-stage problems with practical impact in engineering and operations research.

Abstract

Bayesian optimization is a sample-efficient method for solving expensive, black-box optimization problems. Stochastic programming concerns optimization under uncertainty where, typically, average performance is the quantity of interest. In the first stage of a two-stage problem, here-and-now decisions must be made in the face of uncertainty, while in the second stage, wait-and-see decisions are made after the uncertainty has been resolved. Many methods in stochastic programming assume that the objective is cheap to evaluate and linear or convex. We apply Bayesian optimization to solve non-convex, two-stage stochastic programs which are black-box and expensive to evaluate as, for example, is often the case with simulation objectives. We formulate a knowledge-gradient-based acquisition function to jointly optimize the first- and second-stage variables, establish a guarantee of asymptotic consistency, and provide a computationally efficient approximation. We demonstrate comparable empirical results to an alternative we formulate with fewer approximations, which alternates its focus between the two variable types, and superior empirical results over the state of the art and the standard, naïve, two-step benchmark.
Paper Structure (40 sections, 17 theorems, 94 equations, 7 figures, 8 tables, 5 algorithms)

This paper contains 40 sections, 17 theorems, 94 equations, 7 figures, 8 tables, 5 algorithms.

Key Result

Proposition 3.1

The joint knowledge gradient is non-negative,

Figures (7)

  • Figure 1: A contour plot for a non-convex two-stage stochastic optimization problem at three different values of the fixed design, ${\bm{x}}$. The solid blue line shows the optimal second-stage decision, ${\bm{y}} = {\bm{g}}({\bm{u}})$. The aim is to find the fixed design with the best objective value after taking the expectation over the environmental variable.
  • Figure 2: Evolution of the expected simple regret for GP sampled problems of different dimensions, with gray shading showing the evaluations attributed to the initial design. Note that the joint and alternating algorithms only require the first initial design phase, while the two-step algorithms require both. The colored shaded regions around each line indicate two standard errors either side of the mean (an approximate 95% confidence interval). Joint KG and alternating KG perform well in all cases, with joint KG approaching zero regret slightly faster than alternating KG. Both algorithms outperform the two-step and random sampling benchmark algorithms. The two-step algorithms do particularly badly when the $({\bm{x}},{\bm{y}},{\bm{u}})$-dimension is $(4,1,1)$ since the first step only optimizes the adjustable variables while the bulk of the optimization problem is in the fixed design space.
  • Figure 3: Evolution of the expected simple regret for GP sampled problems of different length scales, with gray shading showing the evaluations attributed to the initial design. Note that the joint and alternating algorithms only require the first initial design phase, while the two-step algorithms require both. The colored shaded regions around each line indicate two standard errors either side of the mean (an approximate 95% confidence interval). Joint KG and alternating KG are consistently the best performers, and the two-step algorithms perform particularly poorly when the short length scale is in the ${\bm{x}}$-dimension.
  • Figure 4: Evolution of the expected simple regret for GP sampled problems with additive Gaussian observation noise of standard deviation $\sigma=2$. The problems have $({\bm{x}}, {\bm{y}}, {\bm{u}})$-dimension $(2, 2, 2)$ and come from the same distribution as those which appear in \ref{['fig:results-dimensions']}. Gray shading shows the evaluations attributed to the initial design. Note that the joint and alternating algorithms only require the first initial design phase, while the two-step algorithms require both. The colored shaded regions around each line indicate two standard errors either side of the mean (an approximate 95% confidence interval). While the performance of all algorithms gets worse with increased levels of observation noise, the joint and alternating KG continue to outperform the Sobol' sequence and two-step benchmarks.
  • Figure 5: Schematic diagram and evolution of the simple regret for the optical table experiment. Gray shading shows the evaluations attributed to the initial design. Note that the joint and alternating algorithms only require the first initial design phase, while the two-step algorithms require both. The colored shaded regions around each line indicate two standard errors either side of the mean (an approximate 95% confidence interval). The joint and alternating KG algorithms are the fastest to converge to zero regret.
  • ...and 2 more figures

Theorems & Definitions (35)

  • Proposition 3.1
  • Theorem 3.2
  • Theorem A.1
  • Lemma A.1
  • proof
  • Lemma A.2
  • proof
  • Lemma A.3
  • proof
  • Proposition A.4: \ref{['thm:joint-kg-nonnegative']} from the main text
  • ...and 25 more