Variational Search Distributions
Daniel M. Steinberg, Rafael Oliveira, Cheng Soon Ong, Edwin V. Bonilla
TL;DR
Steinberg et al. address batch active generation for rare, desirable designs in combinatorial spaces by learning a conditional generative model. They introduce Variational Search Distributions (VSD), a variational-inference framework that approximates the level-set posterior $p(\mathbf{x}|y>\tau)$ with a parameterized $q(\mathbf{x}|\boldsymbol{\phi})$ and uses ELBO optimization with either a GP-PI surrogate or an NN-based class-probability estimator. The paper provides asymptotic convergence guarantees for the learned distribution under GP and NTK-based neural models and demonstrates superior performance over baselines on handwritten-digit conditioning and real sequence-design tasks (DHFR, TrpB, TFBIND8, Ehrlich, GFP, AAV). The results show VSD scales to high-dimensional, discrete design spaces and effectively guides batch experiments, indicating practical impact for protein/DNA/RNA engineering and other combinatorial design problems.
Abstract
We develop VSD, a method for conditioning a generative model of discrete, combinatorial designs on a rare desired class by efficiently evaluating a black-box (e.g. experiment, simulation) in a batch sequential manner. We call this task active generation; we formalize active generation's requirements and desiderata, and formulate a solution via variational inference. VSD uses off-the-shelf gradient based optimization routines, can learn powerful generative models for desirable designs, and can take advantage of scalable predictive models. We derive asymptotic convergence rates for learning the true conditional generative distribution of designs with certain configurations of our method. After illustrating the generative model on images, we empirically demonstrate that VSD can outperform existing baseline methods on a set of real sequence-design problems in various protein and DNA/RNA engineering tasks.
