Simulating, Fast and Slow: Learning Policies for Black-Box Optimization

Fabio Valerio Massoli; Tim Bakker; Thomas Hehn; Tribhuvanesh Orekondy; Arash Behboodi

Simulating, Fast and Slow: Learning Policies for Black-Box Optimization

Fabio Valerio Massoli, Tim Bakker, Thomas Hehn, Tribhuvanesh Orekondy, Arash Behboodi

TL;DR

This work tackles black-box optimization where forward simulations are expensive or non-differentiable. It introduces a policy-based reinforcement learning framework that jointly decides when to retrain a differentiable local surrogate and how to sample new data, guided by an ensemble-based uncertainty signal. By integrating a local surrogate, a gradient-based optimizer, and an active-learning data acquisition policy, the approach achieves up to 90% fewer simulator calls while maintaining or improving optimization performance across benchmark functions and real-world simulators. The combination of policy-driven retraining and learned sampling strategies offers a data-efficient, scalable pathway for optimizing expensive simulators in scientific and engineering applications.

Abstract

In recent years, solving optimization problems involving black-box simulators has become a point of focus for the machine learning community due to their ubiquity in science and engineering. The simulators describe a forward process $f_{\mathrm{sim}}: (ψ, x) \rightarrow y$ from simulation parameters $ψ$ and input data $x$ to observations $y$, and the goal of the optimization problem is to find parameters $ψ$ that minimize a desired loss function. Sophisticated optimization algorithms typically require gradient information regarding the forward process, $f_{\mathrm{sim}}$, with respect to the parameters $ψ$. However, obtaining gradients from black-box simulators can often be prohibitively expensive or, in some cases, impossible. Furthermore, in many applications, practitioners aim to solve a set of related problems. Thus, starting the optimization ``ab initio", i.e. from scratch, each time might be inefficient if the forward model is expensive to evaluate. To address those challenges, this paper introduces a novel method for solving classes of similar black-box optimization problems by learning an active learning policy that guides a differentiable surrogate's training and uses the surrogate's gradients to optimize the simulation parameters with gradient descent. After training the policy, downstream optimization of problems involving black-box simulators requires up to $\sim$90\% fewer expensive simulator calls compared to baselines such as local surrogate-based approaches, numerical optimization, and Bayesian methods.

Simulating, Fast and Slow: Learning Policies for Black-Box Optimization

TL;DR

Abstract

from simulation parameters

and input data

to observations

, and the goal of the optimization problem is to find parameters

that minimize a desired loss function. Sophisticated optimization algorithms typically require gradient information regarding the forward process,

, with respect to the parameters

. However, obtaining gradients from black-box simulators can often be prohibitively expensive or, in some cases, impossible. Furthermore, in many applications, practitioners aim to solve a set of related problems. Thus, starting the optimization ``ab initio", i.e. from scratch, each time might be inefficient if the forward model is expensive to evaluate. To address those challenges, this paper introduces a novel method for solving classes of similar black-box optimization problems by learning an active learning policy that guides a differentiable surrogate's training and uses the surrogate's gradients to optimize the simulation parameters with gradient descent. After training the policy, downstream optimization of problems involving black-box simulators requires up to

90\% fewer expensive simulator calls compared to baselines such as local surrogate-based approaches, numerical optimization, and Bayesian methods.

Paper Structure (50 sections, 5 equations, 9 figures, 2 algorithms)

This paper contains 50 sections, 5 equations, 9 figures, 2 algorithms.

Introduction
Related work
Simulation-based Inference
Approximate-Gradient Optimization
Active Learning
Background
Policy-based Black-Box Optimization
Policy-based Approach
Sampling Strategy
State Definition
Action Definition
Reward Design
Local Surrogate
Local Surrogate Ensemble
Uncertainty Feature
...and 35 more sections

Figures (9)

Figure 1: Schematic view of our approach. (a) We study black-box optimization problem (over parameters $\bm{\psi}$), with an emphasis on using gradient information from a fast differentiable surrogate $f_\phi$ (b) To optimize $\bm{\psi}$ sample-efficiently, we employ a policy $\pi_\theta$ to actively determine whether retraining the surrogate is necessary before using the gradient information.
Figure 2: Loss landscape and learned optimization trajectory for the Probabilistic Three Hump problem. The yellow region denotes $\bm{\psi}$ values that lead to termination. The $\epsilon=0.5$ neighbour around $\bm{\psi}_0$ (black cross) is visualized as the red box. Light green and blue arrows represent gradients from the surrogate or after a simulator call, respectively.
Figure 3: Benchmark function results. Top row: Probabilistic Three Hump problem. Middle row: Rosenbrock problem. Bottom row: Nonlinear Submanifold Hump Problem. AMO (the lower the better) concerning (a) a fixed and (c) a parameterized ${\bm{x}}$ distribution. ANC (the lower the better) regarding (b) a fixed and (d) a parameterized ${\bm{x}}$ distribution. Uncertainties are quantified over evaluation episodes and different random seeds.
Figure 4: Wireless ray-tracing results. (a) Rendering of the indoor environment, (b) AMO (the lower the better), and (c) ANC (the lower the better). Uncertainties are quantified over evaluation episodes and different random seeds. Top row: Conference room environment. Bottom row: Office room environment.
Figure 5: Physics experiments results. (a) Schematic view baranov2017optimising of the active muon shield baseline configuration. The detail concerning the "Target/Magnet hadron absorber" is not relevant to the current discussion. However, we reported that only for completeness. See baranov2017optimising for more details. (b) AMO (the lower the better), (c) and ANC (the lower the better). Uncertainties are quantified over evaluation episodes and different random seeds.
...and 4 more figures

Simulating, Fast and Slow: Learning Policies for Black-Box Optimization

TL;DR

Abstract

Simulating, Fast and Slow: Learning Policies for Black-Box Optimization

Authors

TL;DR

Abstract

Table of Contents

Figures (9)