Table of Contents
Fetching ...

Joint Composite Latent Space Bayesian Optimization

Natalie Maus, Zhiyuan Jerry Lin, Maximilian Balandat, Eytan Bakshy

TL;DR

JoCo introduces a scalable approach to Bayesian Optimization when the objective is a high-dimensional composite function $f(x)=g(h(x))$, where both the input and intermediate outputs are large. It jointly trains two neural encoders and two Gaussian-process surrogates to learn latent representations that preserve information relevant to the final reward, enabling effective BO on compressed spaces via Thompson sampling within a TuRBO trust region. Across nine diverse problems—including synthetic benchmarks, environmental PDEs, rover planning, and adversarial prompts for LLMs and image models—JoCo consistently outperforms baselines and exhibits strong early optimization performance, while ablations show the importance of joint training and trust-region dynamics. The work broadens the applicability of BO to complex, high-dimensional problems and demonstrates practical utility in AI safety-adjacent tasks, with code available for replication and further development.

Abstract

Bayesian Optimization (BO) is a technique for sample-efficient black-box optimization that employs probabilistic models to identify promising input locations for evaluation. When dealing with composite-structured functions, such as f=g o h, evaluating a specific location x yields observations of both the final outcome f(x) = g(h(x)) as well as the intermediate output(s) h(x). Previous research has shown that integrating information from these intermediate outputs can enhance BO performance substantially. However, existing methods struggle if the outputs h(x) are high-dimensional. Many relevant problems fall into this setting, including in the context of generative AI, molecular design, or robotics. To effectively tackle these challenges, we introduce Joint Composite Latent Space Bayesian Optimization (JoCo), a novel framework that jointly trains neural network encoders and probabilistic models to adaptively compress high-dimensional input and output spaces into manageable latent representations. This enables viable BO on these compressed representations, allowing JoCo to outperform other state-of-the-art methods in high-dimensional BO on a wide variety of simulated and real-world problems.

Joint Composite Latent Space Bayesian Optimization

TL;DR

JoCo introduces a scalable approach to Bayesian Optimization when the objective is a high-dimensional composite function , where both the input and intermediate outputs are large. It jointly trains two neural encoders and two Gaussian-process surrogates to learn latent representations that preserve information relevant to the final reward, enabling effective BO on compressed spaces via Thompson sampling within a TuRBO trust region. Across nine diverse problems—including synthetic benchmarks, environmental PDEs, rover planning, and adversarial prompts for LLMs and image models—JoCo consistently outperforms baselines and exhibits strong early optimization performance, while ablations show the importance of joint training and trust-region dynamics. The work broadens the applicability of BO to complex, high-dimensional problems and demonstrates practical utility in AI safety-adjacent tasks, with code available for replication and further development.

Abstract

Bayesian Optimization (BO) is a technique for sample-efficient black-box optimization that employs probabilistic models to identify promising input locations for evaluation. When dealing with composite-structured functions, such as f=g o h, evaluating a specific location x yields observations of both the final outcome f(x) = g(h(x)) as well as the intermediate output(s) h(x). Previous research has shown that integrating information from these intermediate outputs can enhance BO performance substantially. However, existing methods struggle if the outputs h(x) are high-dimensional. Many relevant problems fall into this setting, including in the context of generative AI, molecular design, or robotics. To effectively tackle these challenges, we introduce Joint Composite Latent Space Bayesian Optimization (JoCo), a novel framework that jointly trains neural network encoders and probabilistic models to adaptively compress high-dimensional input and output spaces into manageable latent representations. This enables viable BO on these compressed representations, allowing JoCo to outperform other state-of-the-art methods in high-dimensional BO on a wide variety of simulated and real-world problems.
Paper Structure (54 sections, 2 equations, 12 figures, 1 table, 1 algorithm)

This paper contains 54 sections, 2 equations, 12 figures, 1 table, 1 algorithm.

Figures (12)

  • Figure 1: JoCo architecture: Two NN encoders, $\textcolor{red}{\mathcal{E_X}}$ and $\textcolor{blue}{\mathcal{E_Y}}$, embed the high-dimensional input and intermediate output spaces into lower-dimensional latent spaces, $\hat{\mathcal{X}}$ and $\hat{\mathcal{Y}}$, respectively. The latent probabilistic model $\textcolor{red}{\hat{h}}$ maps the embedded input space to a distribution over the embedded intermediate output space $\hat{\mathcal{Y}}$, while $\textcolor{blue}{\hat{g}}$ maps $\hat{\mathcal{Y}}$ to a distribution over possible composite function values. Together, these components enable effective high-dimensional optimization by jointly learning representations that enable accurate prediction and optimization of the composite function $\textcolor{#800080}{f}$.
  • Figure 2: JoCo outperforms other baselines across nine high-dimensional composite BO tasks. Top row: Results for the five composite BO tasks including synthetic functions (Langermann, Rosenbrock) and problems motivated by real-world applications (environment modeling, PDE, and rover trajectory planning). Bottom row: Results for the large language model and image generation prompt optimization tasks.
  • Figure 3: Toxic text generation task, examples of successful prompts/replies found by JoCo.
  • Figure 4: Examples of successful prompts found by JoCo for various image generation tasks. Panels depict the results of applying JoCo to trick a text-to-image model into generating images of sports cars (a), dogs (b), and aircraft (c), respectively, despite no individual words related to the target objects being present in the prompts (and for dogs and aircraft the prompt containing a set of misleading tokens).
  • Figure 5: Performance comparison of JoCo under three training schemes: (1) JoCo: continuous joint updating of encoders and GPs, where both components are updated together throughout the optimization (2) Not Updating Models: the models are not updated post initial training (3) W/o Joint Training: $\textcolor{red}{\mathcal{E_X}}$ and $\textcolor{red}{\hat{h}}$ are updated first followed by a separate updating of $\textcolor{blue}{\mathcal{E_Y}}$ and $\textcolor{blue}{\hat{g}}$. We observe a notable performance degradation when deviating from the joint and continuous updating training scheme, which is particularly pronounced in the more complex generative AI tasks.
  • ...and 7 more figures