Bayesian Optimization for Non-Convex Two-Stage Stochastic Optimization Problems

Jack M. Buckingham; Ivo Couckuyt; Juergen Branke

Bayesian Optimization for Non-Convex Two-Stage Stochastic Optimization Problems

Jack M. Buckingham, Ivo Couckuyt, Juergen Branke

TL;DR

The paper tackles expensive, black-box two-stage stochastic optimization by introducing a joint knowledge-gradient-based Bayesian optimization (jKG) that simultaneously learns the here-and-now design ${\bm x}$ and a wait-and-see policy ${\bm g}$. It provides a theoretical consistency result showing $\bar f({\mathbf x}^{*n},{\mathbf g}^{*n})$ converges to the optimum as $n\to\infty$, and develops a computationally tractable approximation using discrete inner spaces and quasi-Monte Carlo methods. An alternative alternating KG (aKG) and a two-step KG (2sKG) are proposed for scalability and benchmarking. Empirically, jKG and aKG outperform 2sKG and Sobol/random baselines across synthetic GP landscapes and real-world scenarios (optical table and supply chain), achieving faster convergence and better final costs, while also delivering a concrete policy $\mathbf g^*(\cdot)$ alongside the fixed design. The methods advance sample-efficient optimization for expensive two-stage problems with practical impact in engineering and operations research.

Abstract

Bayesian optimization is a sample-efficient method for solving expensive, black-box optimization problems. Stochastic programming concerns optimization under uncertainty where, typically, average performance is the quantity of interest. In the first stage of a two-stage problem, here-and-now decisions must be made in the face of uncertainty, while in the second stage, wait-and-see decisions are made after the uncertainty has been resolved. Many methods in stochastic programming assume that the objective is cheap to evaluate and linear or convex. We apply Bayesian optimization to solve non-convex, two-stage stochastic programs which are black-box and expensive to evaluate as, for example, is often the case with simulation objectives. We formulate a knowledge-gradient-based acquisition function to jointly optimize the first- and second-stage variables, establish a guarantee of asymptotic consistency, and provide a computationally efficient approximation. We demonstrate comparable empirical results to an alternative we formulate with fewer approximations, which alternates its focus between the two variable types, and superior empirical results over the state of the art and the standard, naïve, two-step benchmark.

Bayesian Optimization for Non-Convex Two-Stage Stochastic Optimization Problems

TL;DR

The paper tackles expensive, black-box two-stage stochastic optimization by introducing a joint knowledge-gradient-based Bayesian optimization (jKG) that simultaneously learns the here-and-now design

and a wait-and-see policy

. It provides a theoretical consistency result showing

converges to the optimum as

, and develops a computationally tractable approximation using discrete inner spaces and quasi-Monte Carlo methods. An alternative alternating KG (aKG) and a two-step KG (2sKG) are proposed for scalability and benchmarking. Empirically, jKG and aKG outperform 2sKG and Sobol/random baselines across synthetic GP landscapes and real-world scenarios (optical table and supply chain), achieving faster convergence and better final costs, while also delivering a concrete policy

alongside the fixed design. The methods advance sample-efficient optimization for expensive two-stage problems with practical impact in engineering and operations research.

Abstract

Paper Structure (40 sections, 17 theorems, 94 equations, 7 figures, 8 tables, 5 algorithms)

This paper contains 40 sections, 17 theorems, 94 equations, 7 figures, 8 tables, 5 algorithms.

Introduction
Problem statement
Related work
Background
Gaussian processes
Bayesian optimization
Knowledge gradient
Discrete approximation of knowledge gradient
Knowledge gradient for two-stage problems
Asymptotic consistency of the recommendations as estimators of the supremum
Efficient computation and optimization
An alternative, alternating policy
Improving the fixed design
Improving the adjustable variables
The two-step incumbent method
...and 25 more sections

Key Result

Proposition 3.1

The joint knowledge gradient is non-negative,

Figures (7)

Figure 1: A contour plot for a non-convex two-stage stochastic optimization problem at three different values of the fixed design, ${\bm{x}}$. The solid blue line shows the optimal second-stage decision, ${\bm{y}} = {\bm{g}}({\bm{u}})$. The aim is to find the fixed design with the best objective value after taking the expectation over the environmental variable.
Figure 2: Evolution of the expected simple regret for GP sampled problems of different dimensions, with gray shading showing the evaluations attributed to the initial design. Note that the joint and alternating algorithms only require the first initial design phase, while the two-step algorithms require both. The colored shaded regions around each line indicate two standard errors either side of the mean (an approximate 95% confidence interval). Joint KG and alternating KG perform well in all cases, with joint KG approaching zero regret slightly faster than alternating KG. Both algorithms outperform the two-step and random sampling benchmark algorithms. The two-step algorithms do particularly badly when the $({\bm{x}},{\bm{y}},{\bm{u}})$-dimension is $(4,1,1)$ since the first step only optimizes the adjustable variables while the bulk of the optimization problem is in the fixed design space.
Figure 3: Evolution of the expected simple regret for GP sampled problems of different length scales, with gray shading showing the evaluations attributed to the initial design. Note that the joint and alternating algorithms only require the first initial design phase, while the two-step algorithms require both. The colored shaded regions around each line indicate two standard errors either side of the mean (an approximate 95% confidence interval). Joint KG and alternating KG are consistently the best performers, and the two-step algorithms perform particularly poorly when the short length scale is in the ${\bm{x}}$-dimension.
Figure 4: Evolution of the expected simple regret for GP sampled problems with additive Gaussian observation noise of standard deviation $\sigma=2$. The problems have $({\bm{x}}, {\bm{y}}, {\bm{u}})$-dimension $(2, 2, 2)$ and come from the same distribution as those which appear in \ref{['fig:results-dimensions']}. Gray shading shows the evaluations attributed to the initial design. Note that the joint and alternating algorithms only require the first initial design phase, while the two-step algorithms require both. The colored shaded regions around each line indicate two standard errors either side of the mean (an approximate 95% confidence interval). While the performance of all algorithms gets worse with increased levels of observation noise, the joint and alternating KG continue to outperform the Sobol' sequence and two-step benchmarks.
Figure 5: Schematic diagram and evolution of the simple regret for the optical table experiment. Gray shading shows the evaluations attributed to the initial design. Note that the joint and alternating algorithms only require the first initial design phase, while the two-step algorithms require both. The colored shaded regions around each line indicate two standard errors either side of the mean (an approximate 95% confidence interval). The joint and alternating KG algorithms are the fastest to converge to zero regret.
...and 2 more figures

Theorems & Definitions (35)

Proposition 3.1
Theorem 3.2
Theorem A.1
Lemma A.1
proof
Lemma A.2
proof
Lemma A.3
proof
Proposition A.4: \ref{['thm:joint-kg-nonnegative']} from the main text
...and 25 more

Bayesian Optimization for Non-Convex Two-Stage Stochastic Optimization Problems

TL;DR

Abstract

Bayesian Optimization for Non-Convex Two-Stage Stochastic Optimization Problems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (35)