Table of Contents
Fetching ...

Targeted Variance Reduction: Robust Bayesian Optimization of Black-Box Simulators with Noise Parameters

John Joshua Miller, Simon Mak

TL;DR

A new Bayesian optimization method called Targeted Variance Reduction (TVR), which leverages a novel joint acquisition function over $(\mathbf{x},\boldsymbol{\theta})$, which targets variance reduction on the objective within the desired region of improvement.

Abstract

The optimization of a black-box simulator over control parameters $\mathbf{x}$ arises in a myriad of scientific applications. In such applications, the simulator often takes the form $f(\mathbf{x},\boldsymbolθ)$, where $\boldsymbolθ$ are parameters that are uncertain in practice. Robust optimization aims to optimize the objective $\mathbb{E}[f(\mathbf{x},\boldsymbolΘ)]$, where $\boldsymbolΘ \sim \mathcal{P}$ is a random variable that models uncertainty on $\boldsymbolθ$. For this, existing black-box methods typically employ a two-stage approach for selecting the next point $(\mathbf{x},\boldsymbolθ)$, where $\mathbf{x}$ and $\boldsymbolθ$ are optimized separately via different acquisition functions. As such, these approaches do not employ a joint acquisition over $(\mathbf{x},\boldsymbolθ)$, and thus may fail to fully exploit control-to-noise interactions for effective robust optimization. To address this, we propose a new Bayesian optimization method called Targeted Variance Reduction (TVR). The TVR leverages a novel joint acquisition function over $(\mathbf{x},\boldsymbolθ)$, which targets variance reduction on the objective within the desired region of improvement. Under a Gaussian process surrogate on $f$, the TVR acquisition can be evaluated in closed form, and reveals an insightful exploration-exploitation-precision trade-off for robust black-box optimization. The TVR can further accommodate a broad class of non-Gaussian distributions on $\mathcal{P}$ via a careful integration of normalizing flows. We demonstrate the improved performance of TVR over the state-of-the-art in a suite of numerical experiments and an application to the robust design of automobile brake discs under operational uncertainty.

Targeted Variance Reduction: Robust Bayesian Optimization of Black-Box Simulators with Noise Parameters

TL;DR

A new Bayesian optimization method called Targeted Variance Reduction (TVR), which leverages a novel joint acquisition function over , which targets variance reduction on the objective within the desired region of improvement.

Abstract

The optimization of a black-box simulator over control parameters arises in a myriad of scientific applications. In such applications, the simulator often takes the form , where are parameters that are uncertain in practice. Robust optimization aims to optimize the objective , where is a random variable that models uncertainty on . For this, existing black-box methods typically employ a two-stage approach for selecting the next point , where and are optimized separately via different acquisition functions. As such, these approaches do not employ a joint acquisition over , and thus may fail to fully exploit control-to-noise interactions for effective robust optimization. To address this, we propose a new Bayesian optimization method called Targeted Variance Reduction (TVR). The TVR leverages a novel joint acquisition function over , which targets variance reduction on the objective within the desired region of improvement. Under a Gaussian process surrogate on , the TVR acquisition can be evaluated in closed form, and reveals an insightful exploration-exploitation-precision trade-off for robust black-box optimization. The TVR can further accommodate a broad class of non-Gaussian distributions on via a careful integration of normalizing flows. We demonstrate the improved performance of TVR over the state-of-the-art in a suite of numerical experiments and an application to the robust design of automobile brake discs under operational uncertainty.
Paper Structure (18 sections, 1 theorem, 25 equations, 8 figures, 1 table, 1 algorithm)

This paper contains 18 sections, 1 theorem, 25 equations, 8 figures, 1 table, 1 algorithm.

Key Result

Proposition 1

Suppose there are no duplicates in the set of points $\{\mathbf{x}_{(1)}, \cdots, \mathbf{x}_{(k)}, \mathbf{x}_n^*\}$. Given $k \geq 2$, the $k$-TVR acquisition function in eq:ktvr can then be simplified as: where $\Phi$ is the standard normal c.d.f., and: Here, $\boldsymbol{\mu}_{n,k} = (\mu_n(\mathbf{x}_{(1)}),\cdots,\mu_n(\mathbf{x}_{(k)}),\mu_n(\mathbf{x}_n^*))^\top$ is the posterior mean of

Figures (8)

  • Figure 1: [Left] Realizations of the test function $f(x,{\Theta})$ in \ref{['eq:mot']} for different samples of $\Theta$. Solid red curves mark the desired objective $g(x) = \mathbb{E}[f(x ,\Theta)]$, and the dotted line shows the optimum $x^* = 0.05$. [Right] Visualizing the fitted GP model on $g(x)$ and its corresponding chosen solution $x_n^*$ for various compared methods. Here, solid curves mark the posterior mean $\mu_n(x)$, the shaded regions mark its 95% confidence region, and the dotted lines show the chosen solution $x_n^*$.
  • Figure 2: Visualizing the trignometric test function $f(x,\theta)$ for different sample draws of $\Theta$ from noise distribution 1 (left) and 2 (right). Plotted in red is the desired objective function $g(x) = \mathbb{E}[f(x,\Theta)]$ to maximize.
  • Figure 3: Plotting the optimization gap $g(\mathbf{x}^*) - g(\mathbf{x}_n^*)$ against the number of function evaluations on $f$ for the 1D-1D trignometric function experiment. The solid lines mark the average optimization gap over 100 trials, and the shaded regions mark its 10th and 90th quantiles.
  • Figure 4: Visualizing samples drawn from the complex correlated distribution $\Theta$ in the 3D-3D Trid function experiment.
  • Figure 5: Plotting the optimization gap $g(\mathbf{x}^*)-g(\mathbf{x}_n^*)$ against the number of function evaluations on $f$ for the 3D-3D Trid function experiment. The solid lines mark the average optimization gap over 100 trials, and the shaded regions mark its 10th and 90th quantiles. The variance reduction method is omitted in these plots due to its large optimization gap.
  • ...and 3 more figures

Theorems & Definitions (1)

  • Proposition 1