Table of Contents
Fetching ...

Leveraging Discrete Function Decomposability for Scientific Design

James C. Bowden, Sergey Levine, Jennifer Listgarten

TL;DR

The paper tackles in silico design in large discrete spaces by exploiting decomposability of the predictive objective, which can be written as $f(x)=\sum_{i} f_i(\tilde{x}_i)+\sum_{(i,j)\in \mathcal{E}} f_{i,j}(\tilde{x}_i,\tilde{x}_j)$ on a junction tree; it introduces DADO, a distributional optimization method that builds a factorized search distribution $p_\theta(x)=p_\theta(\tilde{x}_r)\prod_i p_\theta(\tilde{x}_i\mid \tilde{x}_{p(i)})$ and uses distributional value functions to coordinate updates via message-passing. Empirical results show DADO outperforms naive decomposition-unaware EDAs on synthetic landscapes and protein-design tasks, particularly as design space size grows or decomposability increases. The work provides a reproducible framework for AI-guided design in structured discrete spaces with open-source code.

Abstract

In the era of AI-driven science and engineering, we often want to design discrete objects in silico according to user-specified properties. For example, we may wish to design a protein to bind its target, arrange components within a circuit to minimize latency, or find materials with certain properties. Given a property predictive model, in silico design typically involves training a generative model over the design space (e.g., protein sequence space) to concentrate on designs with the desired properties. Distributional optimization -- which can be formalized as an estimation of distribution algorithm or as reinforcement learning policy optimization -- finds the generative model that maximizes an objective function in expectation. Optimizing a distribution over discrete-valued designs is in general challenging because of the combinatorial nature of the design space. However, many property predictors in scientific applications are decomposable in the sense that they can be factorized over design variables in a way that could in principle enable more effective optimization. For example, amino acids at a catalytic site of a protein may only loosely interact with amino acids of the rest of the protein to achieve maximal catalytic activity. Current distributional optimization algorithms are unable to make use of such decomposability structure. Herein, we propose and demonstrate use of a new distributional optimization algorithm, Decomposition-Aware Distributional Optimization (DADO), that can leverage any decomposability defined by a junction tree on the design variables, to make optimization more efficient. At its core, DADO employs a soft-factorized "search distribution" -- a learned generative model -- for efficient navigation of the search space, invoking graph message-passing to coordinate optimization across linked factors.

Leveraging Discrete Function Decomposability for Scientific Design

TL;DR

The paper tackles in silico design in large discrete spaces by exploiting decomposability of the predictive objective, which can be written as on a junction tree; it introduces DADO, a distributional optimization method that builds a factorized search distribution and uses distributional value functions to coordinate updates via message-passing. Empirical results show DADO outperforms naive decomposition-unaware EDAs on synthetic landscapes and protein-design tasks, particularly as design space size grows or decomposability increases. The work provides a reproducible framework for AI-guided design in structured discrete spaces with open-source code.

Abstract

In the era of AI-driven science and engineering, we often want to design discrete objects in silico according to user-specified properties. For example, we may wish to design a protein to bind its target, arrange components within a circuit to minimize latency, or find materials with certain properties. Given a property predictive model, in silico design typically involves training a generative model over the design space (e.g., protein sequence space) to concentrate on designs with the desired properties. Distributional optimization -- which can be formalized as an estimation of distribution algorithm or as reinforcement learning policy optimization -- finds the generative model that maximizes an objective function in expectation. Optimizing a distribution over discrete-valued designs is in general challenging because of the combinatorial nature of the design space. However, many property predictors in scientific applications are decomposable in the sense that they can be factorized over design variables in a way that could in principle enable more effective optimization. For example, amino acids at a catalytic site of a protein may only loosely interact with amino acids of the rest of the protein to achieve maximal catalytic activity. Current distributional optimization algorithms are unable to make use of such decomposability structure. Herein, we propose and demonstrate use of a new distributional optimization algorithm, Decomposition-Aware Distributional Optimization (DADO), that can leverage any decomposability defined by a junction tree on the design variables, to make optimization more efficient. At its core, DADO employs a soft-factorized "search distribution" -- a learned generative model -- for efficient navigation of the search space, invoking graph message-passing to coordinate optimization across linked factors.

Paper Structure

This paper contains 37 sections, 36 equations, 8 figures, 1 algorithm.

Figures (8)

  • Figure 1: Key components of DADO. a, DADO requires as input an objective function in its decomposed form, $\hbox{$f(x) = C_1(\hat{x}_1) + C_2(\hat{x}_2), \dots, C_\kappa(\hat{x}_\kappa)$}$, which corresponds to a junction tree. Here we show a junction tree with nodes of size 1, i.e., a regular tree, for simplicity. Variables with edges interact together to directly influence $f$. Some variables participate in multiple component functions, requiring coordination in the form of message-passing. b, To update the search distribution at each iteration, naive EDAs weight entire samples drawn from a joint distribution over all design variables by scoring with $f(x)$. Shade of green denotes more to less optimal scores. In contrast, DADO leverages the decomposition of $f(x)$ to weight samples in a more local manner, according to the decomposition. Specifically, DADO uses message-passing to compute value functions that account for $x_i$ interacting with its descendants. Correspondingly, the value functions serve as the weights for each part of the search distribution, which is factorized like $f$. Optional shaping function $W$ is omitted for clarity. c, Example performance comparison on a synthetic problem with an exact tree decomposition over a discrete design space of size $20^{50}$ ($D=20, L=50$). Each of the two methods drew 100 samples per iteration. We evaluated these samples with $f(x)$, computing the per-iteration mean and 95% confidence interval. Results shown were averaged over 20 random seeds for the same $f(x)$ (details in Sec. \ref{['sec:exp-synthetic']}, Fig. \ref{['fig:synthetic-tree']}). The p-value shown is from a two-sided paired t-test that AUC of the per-iteration means is different between methods, using the 20 mean curves.
  • Figure 2: Comparison of a naive EDA to DADO on synthetic problems. We created three random functions, $f(x)$, each with a randomly chosen junction tree decomposition with maximum node size of one, and randomly chosen parameters. Each experiment used alphabet size $D=20$ and sequence length, a,$L=25$, b,$L=50$, and c,$L=200$. Each of the two methods drew $K=100$ samples per iteration. For each iteration, we show the mean (solid line) and 95% confidence interval (shaded envelope) of the 100 samples evaluated on $f(x)$, averaged across results from 20 random seeds. P-values are from two-sided paired t-tests that AUC of the per-iteration mean is different between methods, using the 20 mean curves.
  • Figure 3: Comparison of a naive EDA to DADO on protein property predictive models. For each of four proteins of varying length, a, Amyloid, b, AAV, c, GB1, and d, TDP-43, we fit a neural network property function, $f(x)$, adhering to a junction tree decomposition derived from the protein's 3D structure, and then used a naive EDA and DADO to optimize them. Each approach drew $K=1000$ samples per EDA iteration. For each iteration, we show the mean (solid line) and 95% confidence interval (shaded envelope) of the 1000 samples evaluated on $f(x)$, averaged across results from 20 random seeds. P-values are from two-sided paired t-tests that AUC of the per-iteration mean is different between methods, using the 20 mean curves.
  • Figure A1: Comparison of a standard EDA to DADO on synthetic problems drawing 100 samples each iteration. We created three random functions, $f(x)$, each with a randomly chosen junction tree decomposition with maximum node size of one, and randomly chosen parameters. Each experiment used alphabet size $D=20$ and sequence length, a,$L=25$ (also Fig. \ref{['fig:synthetic-tree']}a), b,$L=50$ (also Fig. \ref{['fig:synthetic-tree']}b), and c,$L=100$. Each of the two methods drew $K=100$ samples per iteration. For each iteration, we show the mean (solid line) and 95% confidence interval (shaded envelope) of the 100 samples evaluated on $f(x)$, averaged across results from 20 random seeds. P-values are from two-sided paired t-tests that AUC of the per-iteration mean is different between methods, using the 20 mean curves.
  • Figure A2: Comparison of a standard EDA to DADO on synthetic problems drawing 1000 samples each iteration. We created three random functions, $f(x)$, each with a randomly chosen junction tree decomposition with maximum node size of one, and randomly chosen parameters. Each experiment used alphabet size $D=20$ and sequence length, a,$L=25$, b,$L=50$, and c,$L=100$. Each of the two methods drew $K=1000$ samples per iteration. For each iteration, we show the mean (solid line) and 95% confidence interval (shaded envelope) of the 100 samples evaluated on $f(x)$, averaged across results from 20 random seeds. P-values are from two-sided paired t-tests that AUC of the per-iteration mean is different between methods, using the 20 mean curves.
  • ...and 3 more figures