BOFormer: Learning to Solve Multi-Objective Bayesian Optimization via Non-Markovian RL

Yu-Heng Hung; Kai-Jie Lin; Yu-Heng Lin; Chien-Yi Wang; Cheng Sun; Ping-Chun Hsieh

BOFormer: Learning to Solve Multi-Objective Bayesian Optimization via Non-Markovian RL

Yu-Heng Hung, Kai-Jie Lin, Yu-Heng Lin, Chien-Yi Wang, Cheng Sun, Ping-Chun Hsieh

TL;DR

This paper tackles multi-objective Bayesian optimization under non-Markovian dynamics, where traditional acquisition functions struggle due to hypervolume identifiability. It introduces BOFormer, a Transformer-based Generalized DQN that treats MOBO as sequence modeling with a Q-augmented observation to achieve non-myopic decisions. Key innovations include a demo-policy for exploration, a Prioritized Trajectory Replay Buffer for off-policy learning, and domain-agnostic representations enabling zero-shot and cross-domain transfer without Monte-Carlo inference. Experiments on synthetic MOBO tasks and real-world HPO (3D Gaussian Splatting) show that BOFormer consistently obtains higher hypervolume than rule-based and learning-based baselines, demonstrating practical impact for efficient, general-purpose MOBO.

Abstract

Bayesian optimization (BO) offers an efficient pipeline for optimizing black-box functions with the help of a Gaussian process prior and an acquisition function (AF). Recently, in the context of single-objective BO, learning-based AFs witnessed promising empirical results given its favorable non-myopic nature. Despite this, the direct extension of these approaches to multi-objective Bayesian optimization (MOBO) suffer from the \textit{hypervolume identifiability issue}, which results from the non-Markovian nature of MOBO problems. To tackle this, inspired by the non-Markovian RL literature and the success of Transformers in language modeling, we present a generalized deep Q-learning framework and propose \textit{BOFormer}, which substantiates this framework for MOBO via sequence modeling. Through extensive evaluation, we demonstrate that BOFormer constantly outperforms the benchmark rule-based and learning-based algorithms in various synthetic MOBO and real-world multi-objective hyperparameter optimization problems. We have made the source code publicly available to encourage further research in this direction.

BOFormer: Learning to Solve Multi-Objective Bayesian Optimization via Non-Markovian RL

TL;DR

Abstract

BOFormer: Learning to Solve Multi-Objective Bayesian Optimization via Non-Markovian RL

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (7)