Table of Contents
Fetching ...

BO4IO: A Bayesian optimization approach to inverse optimization with uncertainty quantification

Yen-An Lu, Wei-Shou Hu, Joel A. Paulson, Qi Zhang

TL;DR

BO4IO introduces a derivative-free Bayesian optimization method for data-driven inverse optimization by modeling the IO loss with a Gaussian process surrogate and solving lower-level forward problems to evaluate the loss. Using a lower confidence bound acquisition, it efficiently identifies IO parameters even when forward problems are nonconvex or include discrete decisions, and it leverages the GP posterior to approximate the profile likelihood for uncertainty quantification. The method is validated on flux balance analysis and pooling problems, showing data-efficient parameter recovery and meaningful identifiability assessments via approximate confidence intervals. This contributes a practical, uncertainty-aware tool for IO in complex optimization tasks with broad applicability to systems biology and operations research.

Abstract

This work addresses data-driven inverse optimization (IO), where the goal is to estimate unknown parameters in an optimization model from observed decisions that can be assumed to be optimal or near-optimal solutions to the optimization problem. The IO problem is commonly formulated as a large-scale bilevel program that is notoriously difficult to solve. Deviating from traditional exact solution methods, we propose a derivative-free optimization approach based on Bayesian optimization, which we call BO4IO, to solve general IO problems. We treat the IO loss function as a black box and approximate it with a Gaussian process model. Using the predicted posterior function, an acquisition function is minimized at each iteration to query new candidate solutions and sequentially converge to the optimal parameter estimates. The main advantages of using Bayesian optimization for IO are two-fold: (i) it circumvents the need of complex reformulations of the bilevel program or specialized algorithms and can hence enable computational tractability even when the underlying optimization problem is nonconvex or involves discrete variables, and (ii) it allows approximations of the profile likelihood, which provide uncertainty quantification on the IO parameter estimates. We apply the proposed method to three computational case studies, covering different classes of forward optimization problems ranging from convex nonlinear to nonconvex mixed-integer nonlinear programs. Our extensive computational results demonstrate the efficacy and robustness of BO4IO to accurately estimate unknown model parameters from small and noisy datasets. In addition, the proposed profile likelihood analysis has proven to be effective in providing good approximations of the confidence intervals on the parameter estimates and assessing the identifiability of the unknown parameters.

BO4IO: A Bayesian optimization approach to inverse optimization with uncertainty quantification

TL;DR

BO4IO introduces a derivative-free Bayesian optimization method for data-driven inverse optimization by modeling the IO loss with a Gaussian process surrogate and solving lower-level forward problems to evaluate the loss. Using a lower confidence bound acquisition, it efficiently identifies IO parameters even when forward problems are nonconvex or include discrete decisions, and it leverages the GP posterior to approximate the profile likelihood for uncertainty quantification. The method is validated on flux balance analysis and pooling problems, showing data-efficient parameter recovery and meaningful identifiability assessments via approximate confidence intervals. This contributes a practical, uncertainty-aware tool for IO in complex optimization tasks with broad applicability to systems biology and operations research.

Abstract

This work addresses data-driven inverse optimization (IO), where the goal is to estimate unknown parameters in an optimization model from observed decisions that can be assumed to be optimal or near-optimal solutions to the optimization problem. The IO problem is commonly formulated as a large-scale bilevel program that is notoriously difficult to solve. Deviating from traditional exact solution methods, we propose a derivative-free optimization approach based on Bayesian optimization, which we call BO4IO, to solve general IO problems. We treat the IO loss function as a black box and approximate it with a Gaussian process model. Using the predicted posterior function, an acquisition function is minimized at each iteration to query new candidate solutions and sequentially converge to the optimal parameter estimates. The main advantages of using Bayesian optimization for IO are two-fold: (i) it circumvents the need of complex reformulations of the bilevel program or specialized algorithms and can hence enable computational tractability even when the underlying optimization problem is nonconvex or involves discrete variables, and (ii) it allows approximations of the profile likelihood, which provide uncertainty quantification on the IO parameter estimates. We apply the proposed method to three computational case studies, covering different classes of forward optimization problems ranging from convex nonlinear to nonconvex mixed-integer nonlinear programs. Our extensive computational results demonstrate the efficacy and robustness of BO4IO to accurately estimate unknown model parameters from small and noisy datasets. In addition, the proposed profile likelihood analysis has proven to be effective in providing good approximations of the confidence intervals on the parameter estimates and assessing the identifiability of the unknown parameters.
Paper Structure (19 sections, 15 equations, 9 figures, 1 table, 2 algorithms)

This paper contains 19 sections, 15 equations, 9 figures, 1 table, 2 algorithms.

Figures (9)

  • Figure 1: Illustrative example of outer-approximate (OA-CI, blue), original (CI, green), and inner-approximate (IA-CI, red) confidence intervals of a parameter of interest ($\hat{\theta}_k$) based on the proposed PL analysis.
  • Figure 2: Effect of the dimensionality ($d$) of $\theta$ on the accuracy of the estimated FOPs. Convergence analysis of (a) training error, (b) testing error, and (c) parameter error with varying $d$. Training and testing errors refer to the average standardized decision error defined as $\sum_{i\in\mathcal{I}}\sum_{k\in{\mathcal{R}}}(\bar{v}_{ik}-\hat{v}_{ik})^2/|\mathcal{I}|/|\mathcal{R}|$ and calculated based on the training and testing datasets, respectively, whereas the parameter error denotes the difference between the ground-truth ($\theta$) and the estimated ($\hat{\theta}^*$) values. Here, the solid lines and shaded areas respectively denote the medians and confidence intervals of the corresponding error across the 10 random instances. The synthetic dataset of each random instance is generated under the setting of $|\mathcal{I}| = 50$ and $\sigma=0.01$.
  • Figure 3: Approximation of profile likelihood and identifiability of parameter estimates in a two-dimensional FBA problem ($d=2$). a) and d) show the full-space $\widehat{\text{PL}}^\text{LCB}$ of $\theta_1$ and $\theta_2$, whereas b) and d) are the zoomed-in regions around the significant threshold level as defined in \ref{['eqn:OA-CI']}. c) and f) trace the changes in the upper and lower bounds of the OA-CIs of two parameters over the BO iterations. The PL analysis is performed using $\Delta_{\alpha} = \chi^2(0.05, 1)$ and $\rho = 3.84$ (95% confidence level).
  • Figure 4: Impact of noise levels on the BO4IO performance. (a) Testing decision error and b) parameter error at 250 iterations under varying noise levels. Outer-approximation confidence intervals (OA-CIs) of two parameters, (b) $\theta_1$ and (c) $\theta_2$, under varying noise levels ($\sigma$). OA-CIs are derived using $\Delta_{\alpha} = \chi^2(0.05, 1)$ and $\rho = 3.84$ (95% confidence level).
  • Figure 5: Effect of network size to the BO4IO performance. Convergence analysis of (a) training error, (b) testing error, and (c) parameter error with two benchmark problems, as the numbers shown in the parentheses in the legend denote the varying dimensionality $d$. Training and testing loss are defined as the average standardized decision error, $\sum_{i\in\mathcal{I}}(x_i-\hat{x}_i)^\top (x_i-\hat{x}_i)/|\mathcal{I}|/|\mathcal{R}|$ and calculated based on the training and testing datasets, respectively, whereas parameter error denotes the difference between the ground-truth ($\theta$) and estimated ($\hat{\theta}^*$) values. Here, the solid lines and shaded areas respectively denote the medians and confidence intervals of the corresponding loss across the 10 random instances. The synthetic dataset of each random instance is generated under the setting of $|\mathcal{I}| = 50$ and $\sigma=0.05$.
  • ...and 4 more figures