Table of Contents
Fetching ...

BOFormer: Learning to Solve Multi-Objective Bayesian Optimization via Non-Markovian RL

Yu-Heng Hung, Kai-Jie Lin, Yu-Heng Lin, Chien-Yi Wang, Cheng Sun, Ping-Chun Hsieh

TL;DR

This paper tackles multi-objective Bayesian optimization under non-Markovian dynamics, where traditional acquisition functions struggle due to hypervolume identifiability. It introduces BOFormer, a Transformer-based Generalized DQN that treats MOBO as sequence modeling with a Q-augmented observation to achieve non-myopic decisions. Key innovations include a demo-policy for exploration, a Prioritized Trajectory Replay Buffer for off-policy learning, and domain-agnostic representations enabling zero-shot and cross-domain transfer without Monte-Carlo inference. Experiments on synthetic MOBO tasks and real-world HPO (3D Gaussian Splatting) show that BOFormer consistently obtains higher hypervolume than rule-based and learning-based baselines, demonstrating practical impact for efficient, general-purpose MOBO.

Abstract

Bayesian optimization (BO) offers an efficient pipeline for optimizing black-box functions with the help of a Gaussian process prior and an acquisition function (AF). Recently, in the context of single-objective BO, learning-based AFs witnessed promising empirical results given its favorable non-myopic nature. Despite this, the direct extension of these approaches to multi-objective Bayesian optimization (MOBO) suffer from the \textit{hypervolume identifiability issue}, which results from the non-Markovian nature of MOBO problems. To tackle this, inspired by the non-Markovian RL literature and the success of Transformers in language modeling, we present a generalized deep Q-learning framework and propose \textit{BOFormer}, which substantiates this framework for MOBO via sequence modeling. Through extensive evaluation, we demonstrate that BOFormer constantly outperforms the benchmark rule-based and learning-based algorithms in various synthetic MOBO and real-world multi-objective hyperparameter optimization problems. We have made the source code publicly available to encourage further research in this direction.

BOFormer: Learning to Solve Multi-Objective Bayesian Optimization via Non-Markovian RL

TL;DR

This paper tackles multi-objective Bayesian optimization under non-Markovian dynamics, where traditional acquisition functions struggle due to hypervolume identifiability. It introduces BOFormer, a Transformer-based Generalized DQN that treats MOBO as sequence modeling with a Q-augmented observation to achieve non-myopic decisions. Key innovations include a demo-policy for exploration, a Prioritized Trajectory Replay Buffer for off-policy learning, and domain-agnostic representations enabling zero-shot and cross-domain transfer without Monte-Carlo inference. Experiments on synthetic MOBO tasks and real-world HPO (3D Gaussian Splatting) show that BOFormer consistently obtains higher hypervolume than rule-based and learning-based baselines, demonstrating practical impact for efficient, general-purpose MOBO.

Abstract

Bayesian optimization (BO) offers an efficient pipeline for optimizing black-box functions with the help of a Gaussian process prior and an acquisition function (AF). Recently, in the context of single-objective BO, learning-based AFs witnessed promising empirical results given its favorable non-myopic nature. Despite this, the direct extension of these approaches to multi-objective Bayesian optimization (MOBO) suffer from the \textit{hypervolume identifiability issue}, which results from the non-Markovian nature of MOBO problems. To tackle this, inspired by the non-Markovian RL literature and the success of Transformers in language modeling, we present a generalized deep Q-learning framework and propose \textit{BOFormer}, which substantiates this framework for MOBO via sequence modeling. Through extensive evaluation, we demonstrate that BOFormer constantly outperforms the benchmark rule-based and learning-based algorithms in various synthetic MOBO and real-world multi-objective hyperparameter optimization problems. We have made the source code publicly available to encourage further research in this direction.

Paper Structure

This paper contains 31 sections, 1 theorem, 13 equations, 13 figures, 3 tables, 2 algorithms.

Key Result

Proposition 3.1

The pair of $(V^*, Q^*)$ is the unique solution to the following system of equations: where $V:\mathop{\mathrm{\mathcal{H}}}\nolimits \rightarrow \mathbb{R}$ and $Q:\mathop{\mathrm{\mathcal{H}}}\nolimits\times \mathop{\mathrm{\mathcal{A}}}\nolimits\rightarrow \mathbb{R}$ are bounded real-valued functions.

Figures (13)

  • Figure 1: Left: In SOBO, an RL-based AF (e.g., FSAF hsieh2021reinforced) takes the posterior mean and standard deviation $(\mu_t(x),\sigma_t(x))$ and the best function value observed so far $y_t^*$ as input and then outputs the AF value $\Upsilon_t(x)$. An direct extension to MOBO simply takes into account the same set of information about all the $K$ objective functions. Right: The hypervolume identifiability issue can be illustrated by comparing the hypervolume improvement incurred by the sample $x_3$ in the two different scenarios above. Clearly, despite that the AF inputs at $x_3$ are the same in both scenarios, the increases in hypervolume upon sampling $x_3$ are rather different. Hence, the increase in hypervolume is not identifiable solely based on the AF input $(\mu_t^{(i)}(x),\sigma_t^{(i)}(x)),y_t^{(i)*})_{i\in [K]}$ of the existing RL-based AFs.
  • Figure 2: BOFormer comprises two distinct networks as shown above: The upper network functions as the policy network, utilizing the historical data and the Q-value predicted by the target network to estimate the Q-values for action selection. The lower network serves as the target network, responsible for constructing Q-values for past observation-action pairs.
  • Figure 3: Performance profiles of hypervolume at the final step.
  • Figure 4: Attained hypervolumes of BOFormer under various sequence lengths.
  • Figure 5: Training loss for $3$ variants of BOFormer models
  • ...and 8 more figures

Theorems & Definitions (7)

  • Proposition 3.1: dong2022simple
  • Remark 4.1
  • Remark 4.2
  • Remark C.1
  • Remark D.1
  • Remark D.2
  • Remark D.3