Table of Contents
Fetching ...

Learning Surrogates for Offline Black-Box Optimization via Gradient Matching

Minh Hoang, Azza Fadhel, Aryan Deshwal, Janardhan Rao Doppa, Trong Nghia Hoang

TL;DR

The paper tackles offline black-box optimization, where surrogates learned from offline data may misguide gradient-based search outside the data regime. It develops a theoretical bound showing the offline optimization gap is controlled by how closely the surrogate's gradient matches the true gradient, and introduces MATCH-OPT, a gradient-matching surrogate learning algorithm that leverages line-integral gradient information and monotonic trajectories from offline data. Theoretical results are complemented by extensive experiments on six design benchmarks, where MATCH-OPT consistently achieves reliable, competitive performance and improvements over strong baselines. This work provides a principled, practical path to more robust offline optimization with potential impact on material, chemical, and hardware design problems.

Abstract

Offline design optimization problem arises in numerous science and engineering applications including material and chemical design, where expensive online experimentation necessitates the use of in silico surrogate functions to predict and maximize the target objective over candidate designs. Although these surrogates can be learned from offline data, their predictions are often inaccurate outside the offline data regime. This challenge raises a fundamental question about the impact of imperfect surrogate model on the performance gap between its optima and the true optima, and to what extent the performance loss can be mitigated. Although prior work developed methods to improve the robustness of surrogate models and their associated optimization processes, a provably quantifiable relationship between an imperfect surrogate and the corresponding performance gap, as well as whether prior methods directly address it, remain elusive. To shed light on this important question, we present a theoretical framework to understand offline black-box optimization, by explicitly bounding the optimization quality based on how well the surrogate matches the latent gradient field that underlines the offline data. Inspired by our theoretical analysis, we propose a principled black-box gradient matching algorithm to create effective surrogate models for offline optimization, improving over prior approaches on various real-world benchmarks.

Learning Surrogates for Offline Black-Box Optimization via Gradient Matching

TL;DR

The paper tackles offline black-box optimization, where surrogates learned from offline data may misguide gradient-based search outside the data regime. It develops a theoretical bound showing the offline optimization gap is controlled by how closely the surrogate's gradient matches the true gradient, and introduces MATCH-OPT, a gradient-matching surrogate learning algorithm that leverages line-integral gradient information and monotonic trajectories from offline data. Theoretical results are complemented by extensive experiments on six design benchmarks, where MATCH-OPT consistently achieves reliable, competitive performance and improvements over strong baselines. This work provides a principled, practical path to more robust offline optimization with potential impact on material, chemical, and hardware design problems.

Abstract

Offline design optimization problem arises in numerous science and engineering applications including material and chemical design, where expensive online experimentation necessitates the use of in silico surrogate functions to predict and maximize the target objective over candidate designs. Although these surrogates can be learned from offline data, their predictions are often inaccurate outside the offline data regime. This challenge raises a fundamental question about the impact of imperfect surrogate model on the performance gap between its optima and the true optima, and to what extent the performance loss can be mitigated. Although prior work developed methods to improve the robustness of surrogate models and their associated optimization processes, a provably quantifiable relationship between an imperfect surrogate and the corresponding performance gap, as well as whether prior methods directly address it, remain elusive. To shed light on this important question, we present a theoretical framework to understand offline black-box optimization, by explicitly bounding the optimization quality based on how well the surrogate matches the latent gradient field that underlines the offline data. Inspired by our theoretical analysis, we propose a principled black-box gradient matching algorithm to create effective surrogate models for offline optimization, improving over prior approaches on various real-world benchmarks.

Paper Structure

This paper contains 19 sections, 5 theorems, 59 equations, 4 figures, 9 tables, 1 algorithm.

Key Result

Theorem 3.2

Suppose $g(\mathbf{x})$ is a $\ell$-Lipschitz continuous and $\mu$-Lipschitz smooth function. The worst-case performance gap, $\mathfrak{G}_{m, \lambda} \triangleq \max_{\mathbf{x}}\mathfrak{G}_{m,\lambda}(\mathbf{x})$, between $g$ and some arbitrary surrogate $g_{\phi}$ is upper-bounded by: Note that despite the exponential dependence on $m$, the bound becomes tight and independent of $m$ when t

Figures (4)

  • Figure 1: Comparison of gradient estimation error incurred by MATCH-OPT (orange) and standard regression (blue) while learning the gradient field of the Shekel function on $4$-dimensional input space at different out-of-distribution (OOD) settings where test inputs were drawn from $\mathbb{N}(0, \alpha\mathbf{I})$ while training inputs were drawn from $\mathbb{N}(0, \mathbf{I})$. Smaller $\alpha$ indicates larger deviation from the offline data regime, which widens the performance gap between MATCH-OPT and standard regression.
  • Figure 2: Our approach MATCH-OPT synthesizes input sequences with monotonically increasing target function values from the offline dataset, which are used to train a parametric surrogate model. Our loss function incorporates both standard regression loss (i.e., value matching) and a novel gradient matching loss. We perform gradient search on the trained surrogate to find optimized designs.
  • Figure 3: Plots of (a) mean normalized ranks (MNRs); and (b) mean (normalized) performance of baselines at all performance percentiles.
  • Figure 4: Plots of distributions of mean normalized rank (MNR) of the tested algorithms across all tasks at the (a) $25$-th, (b) $50$-th, (c) $75$-th, and (d) $100$-th performance percentile levels.

Theorems & Definitions (9)

  • Definition 3.1
  • Theorem 3.2: Worst-case optimization risk bound in terms of gradient estimation error
  • Theorem 4.1: Generalized worst-case optimization risk bound
  • Theorem 1.1: Worst-case optimization risk in terms of gradient estimation error
  • proof
  • Lemma 1.2
  • proof
  • Theorem 4.1: Generalized worst-case optimization risk bound
  • proof