A Continuous Relaxation for Discrete Bayesian Optimization

Richard Michael; Simon Bartels; Miguel González-Duque; Yevgen Zainchkovskyy; Jes Frellsen; Søren Hauberg; Wouter Boomsma

A Continuous Relaxation for Discrete Bayesian Optimization

Richard Michael, Simon Bartels, Miguel González-Duque, Yevgen Zainchkovskyy, Jes Frellsen, Søren Hauberg, Wouter Boomsma

TL;DR

This work tackles discrete sequence optimization under strict, expensive evaluation budgets by reframing the problem in a continuous probability-space. It introduces Continuously Relaxed Bayesian Optimization (CoRel), which places a Gaussian process prior over a relaxed objective $ar f$ computed as the expectation under a distribution over sequences, and uses a weighted Hellinger kernel to incorporate prior knowledge. The approach enables acquisition optimization via discrete, continuous, or manifold methods and is instantiated with a product-kernel model over latent subsets; empirical results on GFP and RFP tasks show improved performance in cold-start, low-budget settings compared to state-of-the-art baselines. The study highlights the value of priors and surrogate choices in discrete BO and provides practical software for researchers in protein sequence design and related domains.

Abstract

To optimize efficiently over discrete data and with only few available target observations is a challenge in Bayesian optimization. We propose a continuous relaxation of the objective function and show that inference and optimization can be computationally tractable. We consider in particular the optimization domain where very few observations and strict budgets exist; motivated by optimizing protein sequences for expensive to evaluate bio-chemical properties. The advantages of our approach are two-fold: the problem is treated in the continuous setting, and available prior knowledge over sequences can be incorporated directly. More specifically, we utilize available and learned distributions over the problem domain for a weighting of the Hellinger distance which yields a covariance function. We show that the resulting acquisition function can be optimized with both continuous or discrete optimization algorithms and empirically assess our method on two bio-chemical sequence optimization tasks.

A Continuous Relaxation for Discrete Bayesian Optimization

TL;DR

computed as the expectation under a distribution over sequences, and uses a weighted Hellinger kernel to incorporate prior knowledge. The approach enables acquisition optimization via discrete, continuous, or manifold methods and is instantiated with a product-kernel model over latent subsets; empirical results on GFP and RFP tasks show improved performance in cold-start, low-budget settings compared to state-of-the-art baselines. The study highlights the value of priors and surrogate choices in discrete BO and provides practical software for researchers in protein sequence design and related domains.

Abstract

Paper Structure (42 sections, 4 theorems, 13 equations, 11 figures, 4 algorithms)

This paper contains 42 sections, 4 theorems, 13 equations, 11 figures, 4 algorithms.

Introduction
Background
Problem statement
Bayesian optimization
Gaussian process regression
Related work
Issues with Gaussian process models over latent space representations
Issues with Bayesian optimization's budget
Continuously Relaxed Bayesian Optimization
From discrete to continuous space
Representation
Inference
Optimization
The model
The weighted Hellinger kernel
...and 27 more sections

Key Result

Proposition 1

For $f$ and $\bar{f}$: $\operatorname*{arg\:min} f=\operatorname*{arg\:min} \bar{f}$.

Figures (11)

Figure 1: The proposed problem transformation: given a set of discrete element sequences ($X$ (blocks) elements in sequences) with discontinuous observations (left) we continuously relax the objective $\bar{f}$ (right), and assign a $\mathcal{GP}$ prior to it. The probability space over the elements is given by $\phi$, a pretrained a priori model which parameterizes the distributions over the sequences and elements therein (middle, bottom).
Figure 2: Visual comparison of the (weighted) Hellinger distance kernel and a Matérn 5/2. We are using a two-dimensional version of the decoder proposed by brookes_conditioning_2019. For the Matérn kernel we are visualizing $k(0, [z_1, z_2])$ whereas for the Hellinger kernel we show $k(P(x\,|\, 0), P(x'\,|\, [z_1,z_2]))$. With the Hellinger kernel, the decoder induces a much more complex, non-Euclidean similarity measure on the latent space. Note that, the Hellinger kernels are non-stationary in the latent space---\ref{['fig:hellinger_contour_gfp']} in the supplementary displays the same visualization for a different reference point.
Figure 3: Two dimensional VAE latent space adapted from brookes_conditioning_2019. We encode the GFP corpus of experimentally evaluated sequences (dots). Available for optimization are only oracle evaluations - see markers for oracle predictions (top ▲, median ●, and lowest ▼ 10 observations each). Start is the reference wild-type and target the maximally fluorescent candidate.
Figure 4: The GFP sequences are optimized by CoRel and random mutations with the CBas oracle ($\hat{f}_{\text{GFP}}$) over 100 steps. We observe best sequences selected at an iteration with mean (line) and 95%CI (shaded) across seven seeds.
Figure 5: The RFP Pareto front is optimized discretely computing relative hypervolume respective the starting sequences, comparing CoRel with LamBO starting with six observations. Markers ( ●) indicate batch average and std.err. bars across 21 seeds (random 5).
...and 6 more figures

Theorems & Definitions (8)

Proposition 1
proof
Proposition 2
proof
Proposition 3
proof
Proposition 4
proof

A Continuous Relaxation for Discrete Bayesian Optimization

TL;DR

Abstract

A Continuous Relaxation for Discrete Bayesian Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (8)