Leveraging Discrete Function Decomposability for Scientific Design

James C. Bowden; Sergey Levine; Jennifer Listgarten

Leveraging Discrete Function Decomposability for Scientific Design

James C. Bowden, Sergey Levine, Jennifer Listgarten

TL;DR

The paper tackles in silico design in large discrete spaces by exploiting decomposability of the predictive objective, which can be written as $f(x)=\sum_{i} f_i(\tilde{x}_i)+\sum_{(i,j)\in \mathcal{E}} f_{i,j}(\tilde{x}_i,\tilde{x}_j)$ on a junction tree; it introduces DADO, a distributional optimization method that builds a factorized search distribution $p_\theta(x)=p_\theta(\tilde{x}_r)\prod_i p_\theta(\tilde{x}_i\mid \tilde{x}_{p(i)})$ and uses distributional value functions to coordinate updates via message-passing. Empirical results show DADO outperforms naive decomposition-unaware EDAs on synthetic landscapes and protein-design tasks, particularly as design space size grows or decomposability increases. The work provides a reproducible framework for AI-guided design in structured discrete spaces with open-source code.

Abstract

In the era of AI-driven science and engineering, we often want to design discrete objects in silico according to user-specified properties. For example, we may wish to design a protein to bind its target, arrange components within a circuit to minimize latency, or find materials with certain properties. Given a property predictive model, in silico design typically involves training a generative model over the design space (e.g., protein sequence space) to concentrate on designs with the desired properties. Distributional optimization -- which can be formalized as an estimation of distribution algorithm or as reinforcement learning policy optimization -- finds the generative model that maximizes an objective function in expectation. Optimizing a distribution over discrete-valued designs is in general challenging because of the combinatorial nature of the design space. However, many property predictors in scientific applications are decomposable in the sense that they can be factorized over design variables in a way that could in principle enable more effective optimization. For example, amino acids at a catalytic site of a protein may only loosely interact with amino acids of the rest of the protein to achieve maximal catalytic activity. Current distributional optimization algorithms are unable to make use of such decomposability structure. Herein, we propose and demonstrate use of a new distributional optimization algorithm, Decomposition-Aware Distributional Optimization (DADO), that can leverage any decomposability defined by a junction tree on the design variables, to make optimization more efficient. At its core, DADO employs a soft-factorized "search distribution" -- a learned generative model -- for efficient navigation of the search space, invoking graph message-passing to coordinate optimization across linked factors.

Leveraging Discrete Function Decomposability for Scientific Design

TL;DR

Abstract

Leveraging Discrete Function Decomposability for Scientific Design

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)