Table of Contents
Fetching ...

Offline Model-Based Optimization via Policy-Guided Gradient Search

Yassine Chemingui, Aryan Deshwal, Trong Nghia Hoang, Janardhan Rao Doppa

TL;DR

This work introduces a new learning-to-search perspective for offline optimization by reformulating it as an offline reinforcement learning problem and explicitly learns the best policy for a given surrogate model created from the offline data.

Abstract

Offline optimization is an emerging problem in many experimental engineering domains including protein, drug or aircraft design, where online experimentation to collect evaluation data is too expensive or dangerous. To avoid that, one has to optimize an unknown function given only its offline evaluation at a fixed set of inputs. A naive solution to this problem is to learn a surrogate model of the unknown function and optimize this surrogate instead. However, such a naive optimizer is prone to erroneous overestimation of the surrogate (possibly due to over-fitting on a biased sample of function evaluation) on inputs outside the offline dataset. Prior approaches addressing this challenge have primarily focused on learning robust surrogate models. However, their search strategies are derived from the surrogate model rather than the actual offline data. To fill this important gap, we introduce a new learning-to-search perspective for offline optimization by reformulating it as an offline reinforcement learning problem. Our proposed policy-guided gradient search approach explicitly learns the best policy for a given surrogate model created from the offline data. Our empirical results on multiple benchmarks demonstrate that the learned optimization policy can be combined with existing offline surrogates to significantly improve the optimization performance.

Offline Model-Based Optimization via Policy-Guided Gradient Search

TL;DR

This work introduces a new learning-to-search perspective for offline optimization by reformulating it as an offline reinforcement learning problem and explicitly learns the best policy for a given surrogate model created from the offline data.

Abstract

Offline optimization is an emerging problem in many experimental engineering domains including protein, drug or aircraft design, where online experimentation to collect evaluation data is too expensive or dangerous. To avoid that, one has to optimize an unknown function given only its offline evaluation at a fixed set of inputs. A naive solution to this problem is to learn a surrogate model of the unknown function and optimize this surrogate instead. However, such a naive optimizer is prone to erroneous overestimation of the surrogate (possibly due to over-fitting on a biased sample of function evaluation) on inputs outside the offline dataset. Prior approaches addressing this challenge have primarily focused on learning robust surrogate models. However, their search strategies are derived from the surrogate model rather than the actual offline data. To fill this important gap, we introduce a new learning-to-search perspective for offline optimization by reformulating it as an offline reinforcement learning problem. Our proposed policy-guided gradient search approach explicitly learns the best policy for a given surrogate model created from the offline data. Our empirical results on multiple benchmarks demonstrate that the learned optimization policy can be combined with existing offline surrogates to significantly improve the optimization performance.
Paper Structure (7 sections, 5 equations, 2 figures, 11 tables, 2 algorithms)

This paper contains 7 sections, 5 equations, 2 figures, 11 tables, 2 algorithms.

Figures (2)

  • Figure 1: High-level overview of policy-guided gradient search approach for offline black-box optimization (BBO). The key idea is to cast the offline BBO problem as an offline RL problem. This reduction is accomplished by constructing random trajectories from a subset of inputs with high function values from the given offline data $\mathcal{D}$ (say Top $p$ percentile which is determined in a data-driven manner using our offline state estimation approach). The policy $\pi$ corresponds to selecting a step-size vector for a gradient based update on a trained surrogate model $\hat{f}_{\theta}$. Given a learned policy $\pi$, surrogate model $\hat{f}_{\theta}$ and a starting input $x_0$ with high function value sampled from the offline data $\mathcal{D}$, PGS performs $T$ steps of gradient search by asking the policy $\pi$ to predict the step-size vector $\alpha$ at each search step.
  • Figure 2: PGS Action Norms During Search Steps