Table of Contents
Fetching ...

Nonmyopic Global Optimisation via Approximate Dynamic Programming

Filippo Airaldi, Bart De Schutter, Azita Dabiri

TL;DR

This work tackles global optimisation of expensive, gradient-free functions by extending nonmyopic acquisition concepts from Bayesian to deterministic surrogate frameworks. It develops dynamic programming-based lookahead strategies— rollout and multi-step scenario-based optimisation—tailored to IDW and RBF surrogates, including explicit surrogate dynamics and a sampling mechanism that yields GP-like posteriors. A carefully crafted reward function and horizon-based acquisitions enable planning over multiple future evaluations, leading to improved convergence over traditional myopic methods on both synthetic benchmarks and real-world hyperparameter tuning, including data-driven MPC tuning for a chemical reactor. The results demonstrate that nonmyopic deterministic GO can outperform Greedy GO with meaningful gains in final optimisation quality, at the cost of higher computational demand, which can be mitigated with parallel hardware and scalable sampling strategies.

Abstract

Unconstrained global optimisation aims to optimise expensive-to-evaluate black-box functions without gradient information. Bayesian optimisation, one of the most well-known techniques, typically employs Gaussian processes as surrogate models, leveraging their probabilistic nature to balance exploration and exploitation. However, Gaussian processes become computationally prohibitive in high-dimensional spaces. Recent alternatives, based on inverse distance weighting (IDW) and radial basis functions (RBFs), offer competitive, computationally lighter solutions. Despite their efficiency, both traditional global and Bayesian optimisation strategies suffer from the myopic nature of their acquisition functions, which focus solely on immediate improvement neglecting future implications of the sequential decision making process. Nonmyopic acquisition functions devised for the Bayesian setting have shown promise in improving long-term performance. Yet, their use in deterministic strategies with IDW and RBF remains unexplored. In this work, we introduce novel nonmyopic acquisition strategies tailored to IDW- and RBF-based global optimisation. Specifically, we develop dynamic programming-based paradigms, including rollout and multi-step scenario-based optimisation schemes, to enable lookahead acquisition. These methods optimise a sequence of query points over a horizon (instead of only at the next step) by predicting the evolution of the surrogate model, inherently managing the exploration-exploitation trade-off in a systematic way via optimisation techniques. The proposed approach represents a significant advance in extending nonmyopic acquisition principles, previously confined to Bayesian optimisation, to the deterministic framework. Empirical results on synthetic and hyperparameter tuning benchmark problems demonstrate that these nonmyopic methods outperform conventional myopic approaches.

Nonmyopic Global Optimisation via Approximate Dynamic Programming

TL;DR

This work tackles global optimisation of expensive, gradient-free functions by extending nonmyopic acquisition concepts from Bayesian to deterministic surrogate frameworks. It develops dynamic programming-based lookahead strategies— rollout and multi-step scenario-based optimisation—tailored to IDW and RBF surrogates, including explicit surrogate dynamics and a sampling mechanism that yields GP-like posteriors. A carefully crafted reward function and horizon-based acquisitions enable planning over multiple future evaluations, leading to improved convergence over traditional myopic methods on both synthetic benchmarks and real-world hyperparameter tuning, including data-driven MPC tuning for a chemical reactor. The results demonstrate that nonmyopic deterministic GO can outperform Greedy GO with meaningful gains in final optimisation quality, at the cost of higher computational demand, which can be mitigated with parallel hardware and scalable sampling strategies.

Abstract

Unconstrained global optimisation aims to optimise expensive-to-evaluate black-box functions without gradient information. Bayesian optimisation, one of the most well-known techniques, typically employs Gaussian processes as surrogate models, leveraging their probabilistic nature to balance exploration and exploitation. However, Gaussian processes become computationally prohibitive in high-dimensional spaces. Recent alternatives, based on inverse distance weighting (IDW) and radial basis functions (RBFs), offer competitive, computationally lighter solutions. Despite their efficiency, both traditional global and Bayesian optimisation strategies suffer from the myopic nature of their acquisition functions, which focus solely on immediate improvement neglecting future implications of the sequential decision making process. Nonmyopic acquisition functions devised for the Bayesian setting have shown promise in improving long-term performance. Yet, their use in deterministic strategies with IDW and RBF remains unexplored. In this work, we introduce novel nonmyopic acquisition strategies tailored to IDW- and RBF-based global optimisation. Specifically, we develop dynamic programming-based paradigms, including rollout and multi-step scenario-based optimisation schemes, to enable lookahead acquisition. These methods optimise a sequence of query points over a horizon (instead of only at the next step) by predicting the evolution of the surrogate model, inherently managing the exploration-exploitation trade-off in a systematic way via optimisation techniques. The proposed approach represents a significant advance in extending nonmyopic acquisition principles, previously confined to Bayesian optimisation, to the deterministic framework. Empirical results on synthetic and hyperparameter tuning benchmark problems demonstrate that these nonmyopic methods outperform conventional myopic approaches.

Paper Structure

This paper contains 18 sections, 30 equations, 4 figures, 1 table, 1 algorithm.

Figures (4)

  • Figure 1: Depiction of the tree structure induced by the multi-step sampling of the disturbances, whereas the actions to be optimised remain constant within the same stage.
  • Figure 2: Average gap (aggregated over all problems) versus the mean time per iteration $\pm$ the corresponding standard errors for rollout (R) and multi-step (MS) methods with different horizon lengths and sampling schemes. Note that the time axis is in logarithmic scale and has a discontinuity
  • Figure 3: Comparison of the evolution of the optimality gap during optimisation (excluding the $N_0$ warm-up iterations) of each problem for the myopic method (dashed) and the best mean-performing nonmyopic method (solid), as reported in \ref{['tab:numerical:synthetic-and-real:gap']}. The colour scheme follows from \ref{['fig:numerical:itertime-vs-gap']}, and averages and $95\%$ confidence intervals are computed over the 30 repetitions
  • Figure 4: (a) Comparison of the evolution of the optimality gap during optimisation (excluding the $N_0$ warm-up iterations) of the MPC tuning problem for the rollout (R) and multi-step (MS) method. The colour scheme follows from \ref{['fig:numerical:itertime-vs-gap']}, and averages and $95\%$ confidence intervals are computed over the 30 repetitions. (b) Evolution of the reactor temperature $T_\text{R}$, under the MPC policy, during initial (dotted), middle (dashed) and final (solid) iterations of the R-4 (GH) method, for each of the 30 repetitions of the experiment. In red, its lower bounds

Theorems & Definitions (2)

  • Remark 1
  • Remark 2