Learning Surrogates for Offline Black-Box Optimization via Gradient Matching
Minh Hoang, Azza Fadhel, Aryan Deshwal, Janardhan Rao Doppa, Trong Nghia Hoang
TL;DR
The paper tackles offline black-box optimization, where surrogates learned from offline data may misguide gradient-based search outside the data regime. It develops a theoretical bound showing the offline optimization gap is controlled by how closely the surrogate's gradient matches the true gradient, and introduces MATCH-OPT, a gradient-matching surrogate learning algorithm that leverages line-integral gradient information and monotonic trajectories from offline data. Theoretical results are complemented by extensive experiments on six design benchmarks, where MATCH-OPT consistently achieves reliable, competitive performance and improvements over strong baselines. This work provides a principled, practical path to more robust offline optimization with potential impact on material, chemical, and hardware design problems.
Abstract
Offline design optimization problem arises in numerous science and engineering applications including material and chemical design, where expensive online experimentation necessitates the use of in silico surrogate functions to predict and maximize the target objective over candidate designs. Although these surrogates can be learned from offline data, their predictions are often inaccurate outside the offline data regime. This challenge raises a fundamental question about the impact of imperfect surrogate model on the performance gap between its optima and the true optima, and to what extent the performance loss can be mitigated. Although prior work developed methods to improve the robustness of surrogate models and their associated optimization processes, a provably quantifiable relationship between an imperfect surrogate and the corresponding performance gap, as well as whether prior methods directly address it, remain elusive. To shed light on this important question, we present a theoretical framework to understand offline black-box optimization, by explicitly bounding the optimization quality based on how well the surrogate matches the latent gradient field that underlines the offline data. Inspired by our theoretical analysis, we propose a principled black-box gradient matching algorithm to create effective surrogate models for offline optimization, improving over prior approaches on various real-world benchmarks.
