Table of Contents
Fetching ...

ZeroShotOpt: Towards Zero-Shot Pretrained Models for Efficient Black-Box Optimization

Jamison Meindl, Yunsheng Tian, Tony Cui, Veronika Thost, Zhang-Wei Hong, Johannes Dürholt, Jie Chen, Wojciech Matusik, Mina Konaković Luković

TL;DR

This work addresses the challenge of efficiently optimizing expensive, derivative-free black-box functions under tight budgets by introducing ZeroShotOpt, a pretrained transformer-based optimizer for continuous problems up to $20$D. It trains a $200$M-parameter decoder-only transformer through offline reinforcement learning on a massive corpus of optimization trajectories generated from $12$ Bayesian optimization variants and millions of GP-based synthetic functions, enabling robust zero-shot generalization to unseen benchmarks. The model demonstrates competitive sample efficiency with traditional Bayesian optimization across in- and out-of-distribution tasks, while offering a reusable foundation that can be fine-tuned to specific domains such as HPO-B. Moreover, ZeroShotOpt provides practical advantages in runtime and scalability, with open-source data and code to support further extensions and real-world deployment.

Abstract

Global optimization of expensive, derivative-free black-box functions requires extreme sample efficiency. While Bayesian optimization (BO) is the current state-of-the-art, its performance hinges on surrogate and acquisition function hyper-parameters that are often hand-tuned and fail to generalize across problem landscapes. We present ZeroShotOpt, a general-purpose, pretrained model for continuous black-box optimization tasks ranging from 2D to 20D. Our approach leverages offline reinforcement learning on large-scale optimization trajectories collected from 12 BO variants. To scale pretraining, we generate millions of synthetic Gaussian process-based functions with diverse landscapes, enabling the model to learn transferable optimization policies. As a result, ZeroShotOpt achieves robust zero-shot generalization on a wide array of unseen benchmarks, matching or surpassing the sample efficiency of leading global optimizers, including BO, while also offering a reusable foundation for future extensions and improvements. Our open-source code, dataset, and model are available at: https://github.com/jamisonmeindl/zeroshotopt

ZeroShotOpt: Towards Zero-Shot Pretrained Models for Efficient Black-Box Optimization

TL;DR

This work addresses the challenge of efficiently optimizing expensive, derivative-free black-box functions under tight budgets by introducing ZeroShotOpt, a pretrained transformer-based optimizer for continuous problems up to D. It trains a M-parameter decoder-only transformer through offline reinforcement learning on a massive corpus of optimization trajectories generated from Bayesian optimization variants and millions of GP-based synthetic functions, enabling robust zero-shot generalization to unseen benchmarks. The model demonstrates competitive sample efficiency with traditional Bayesian optimization across in- and out-of-distribution tasks, while offering a reusable foundation that can be fine-tuned to specific domains such as HPO-B. Moreover, ZeroShotOpt provides practical advantages in runtime and scalability, with open-source data and code to support further extensions and real-world deployment.

Abstract

Global optimization of expensive, derivative-free black-box functions requires extreme sample efficiency. While Bayesian optimization (BO) is the current state-of-the-art, its performance hinges on surrogate and acquisition function hyper-parameters that are often hand-tuned and fail to generalize across problem landscapes. We present ZeroShotOpt, a general-purpose, pretrained model for continuous black-box optimization tasks ranging from 2D to 20D. Our approach leverages offline reinforcement learning on large-scale optimization trajectories collected from 12 BO variants. To scale pretraining, we generate millions of synthetic Gaussian process-based functions with diverse landscapes, enabling the model to learn transferable optimization policies. As a result, ZeroShotOpt achieves robust zero-shot generalization on a wide array of unseen benchmarks, matching or surpassing the sample efficiency of leading global optimizers, including BO, while also offering a reusable foundation for future extensions and improvements. Our open-source code, dataset, and model are available at: https://github.com/jamisonmeindl/zeroshotopt

Paper Structure

This paper contains 27 sections, 2 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Overall Diagram of ZeroShotOpt. We use a combination of diverse synthetic functions and classical optimizers to train a pretrained transformer model for efficient black-box optimization.
  • Figure 2: 2D synthetic functions generated by our GP-based function generator with various kernels and parameters. The red star is the global minimum and darker color signifies lower function value.
  • Figure 3: Illustration of the ZeroShotOpt architecture. The model embeds continuous inputs using a fixed sinusoidal embedding and 2 learned positional embeddings. The main model is a causal decoder only transformer, trained on loss on the binned action and state space.
  • Figure 4: Ablation on scaling strategies during inference.
  • Figure 5: Ablation on Model Size. Mean normalized performance over steps 10, 20, 30 and 40 on $2$ D, $5$ D, $10$ D and $20$ D BBOB and VLSE functions. We test over 500 functions from each dataset and evaluate standard deviation across 5 independent seed splits.
  • ...and 1 more figures