Table of Contents
Fetching ...

FoMEMO: Towards Foundation Models for Expensive Multi-objective Optimization

Yiming Yao, Fei Liu, Liang Zhao, Xi Lin, Yilu Liu, Qingfu Zhang

TL;DR

FoMEMO proposes a foundation-model framework for expensive multi-objective optimization by pre-training a transformer-based Prior-Data Fitted Network on hundreds of millions of synthetic trajectories to predict aggregated posteriors conditioned on domain context and user preferences. In the in-context phase, the model enables fast candidate generation via preference-based or preference-free acquisition functions without any further model updates, achieving high efficiency and strong generalization across unseen problems. Key contributions include synthetic data generation for broad problem coverage, objective-aware regression heads, and two acquisition families (EI/UCB and UHVI/UR2I) that operate on aggregated posteriors. The approach demonstrates competitive or superior performance and scalable runtime behavior across synthetic, engineering-design, and HPO tasks, offering a practical, adaptable paradigm for real-world MOBO in diverse domains.

Abstract

Expensive multi-objective optimization is a prevalent and crucial concern in many real-world scenarios, where sample-efficiency is vital due to the limited evaluations to recover the true Pareto front for decision making. Existing works either involve rebuilding Gaussian process surrogates from scratch for each objective in each new problem encountered, or rely on extensive past domain experiments for pre-training deep learning models, making them hard to generalize and impractical to cope with various emerging applications in the real world. To address this issue, we propose a new paradigm named FoMEMO (Foundation Models for Expensive Multi-objective Optimization), which enables the establishment of a foundation model conditioned on any domain trajectory and user preference, and facilitates fast in-context optimization based on the predicted preference-wise aggregated posteriors. Rather than accessing extensive real-world domain experiments for training, we demonstrate that pre-training the foundation model with a diverse set of hundreds of millions of synthetic data can lead to superior generalization and optimization performance to unknown problems, without necessitating any subsequent model training or updates in the following optimization process.

FoMEMO: Towards Foundation Models for Expensive Multi-objective Optimization

TL;DR

FoMEMO proposes a foundation-model framework for expensive multi-objective optimization by pre-training a transformer-based Prior-Data Fitted Network on hundreds of millions of synthetic trajectories to predict aggregated posteriors conditioned on domain context and user preferences. In the in-context phase, the model enables fast candidate generation via preference-based or preference-free acquisition functions without any further model updates, achieving high efficiency and strong generalization across unseen problems. Key contributions include synthetic data generation for broad problem coverage, objective-aware regression heads, and two acquisition families (EI/UCB and UHVI/UR2I) that operate on aggregated posteriors. The approach demonstrates competitive or superior performance and scalable runtime behavior across synthetic, engineering-design, and HPO tasks, offering a practical, adaptable paradigm for real-world MOBO in diverse domains.

Abstract

Expensive multi-objective optimization is a prevalent and crucial concern in many real-world scenarios, where sample-efficiency is vital due to the limited evaluations to recover the true Pareto front for decision making. Existing works either involve rebuilding Gaussian process surrogates from scratch for each objective in each new problem encountered, or rely on extensive past domain experiments for pre-training deep learning models, making them hard to generalize and impractical to cope with various emerging applications in the real world. To address this issue, we propose a new paradigm named FoMEMO (Foundation Models for Expensive Multi-objective Optimization), which enables the establishment of a foundation model conditioned on any domain trajectory and user preference, and facilitates fast in-context optimization based on the predicted preference-wise aggregated posteriors. Rather than accessing extensive real-world domain experiments for training, we demonstrate that pre-training the foundation model with a diverse set of hundreds of millions of synthetic data can lead to superior generalization and optimization performance to unknown problems, without necessitating any subsequent model training or updates in the following optimization process.

Paper Structure

This paper contains 46 sections, 4 theorems, 24 equations, 4 figures, 11 tables.

Key Result

Theorem 1.1

Let $D_n=\{\boldsymbol{x}_{i},\boldsymbol{y}_{i}\}_{i=1}^{n}$ be a set of $n$ evaluated samples, $\boldsymbol{z}^*=(z_1^*, \cdots, z_m^*)^T$ is the ideal point, $\boldsymbol{r}=(r_1, \cdots, r_m)^T$ is a pre-defined reference point. The hypervolume of the evaluated samples can be expressed as an exp where $c_m=\frac{\pi^{m / 2}}{2^m \Gamma(m / 2+1)}$ is a constant that depends only on $m$.

Figures (4)

  • Figure 1: Overview of FoMEMO framework, including synthetic pre-training stage (left) and in-context optimization stage (right). During synthetic pre-training, the foundation model is fed with a large set of synthetic data as the context inputs, which include the trajectory pairs $D_n=\{(\boldsymbol{x}_i,\boldsymbol{y}_i)\}_{i=1}^n$, the query input $\boldsymbol{x}$ with its aggregation target $g=-s_{\boldsymbol{\lambda}}(\boldsymbol{x})$ masked, and the corresponding artificial preference $\boldsymbol{\lambda}$ sampled on the simplex. The model parameterized by $\boldsymbol{\theta}$ is trained once only to predict the aggregated posterior distributions $q_{\boldsymbol{\theta}}(g|\boldsymbol{x},D_n;\boldsymbol{\lambda})$ as the output conditioned on the contextual information. During the in-context optimization stage, users can readily address new problems by simply providing evaluated trajectories from arbitrary unseen real-world applications, along with any potential user preferences (if applicable) as prompt inputs for the foundation model. Using the predicted aggregated posteriors as the basis, various acquisition functions can be naturally derived to enable fast in-context optimization. These acquisition functions can be efficiently optimized to suggest the next candidate within the utility of interest, free from any additional model training or updates throughout the entire optimization process.
  • Figure 2: The evolution of modeling accuracy in terms of regression error metrics (RMSE, MAE) and $R^2$ score during the training process.
  • Figure 3: Details of the normalized objective functions for the ToyRobust problem, the aggregated posteriors (green solid lines with shaded regions) conditioned on 7 observations and 5 preferences, and the corresponding UCB acquisition functions (red solid lines).
  • Figure 4: The mean loss convergences with different model configurations.

Theorems & Definitions (5)

  • Theorem 1.1: Hypervolume Scalarization
  • Lemma 1.2: Uncertainty-aware Hypervolume Improvement
  • proof
  • Theorem 1.3: R2 Indicator
  • Lemma 1.4: Uncertainty-aware R2 Improvement