Table of Contents
Fetching ...

Scalable and Cost-Efficient de Novo Template-Based Molecular Generation

Piotr Gaiński, Oussama Boussif, Andrei Rekesh, Dmytro Shevchuk, Ali Parviz, Mike Tyers, Robert A. Batey, Michał Koziarski

TL;DR

SCENT addresses the synthesis bottleneck in template-based molecular generation by introducing Recursive Cost Guidance to steer backward transitions toward low-cost synthesis, complemented by Decomposability Guidance, an Exploitation Penalty, and a Dynamic Building Block Library. The backward policy is guided by a surrogate cost model $\hat{c}_B$, with specialized cost predictors $\hat{c}^S_B$ and $\hat{c}^D_B$ to reduce synthesis expense and enforce valid retrosynthesis. Dynamic Library augmentation expands reachable chemical space and improves credit assignment through trajectory compression, enabling full-tree synthesis. Empirical results across SMALL, MEDIUM, and LARGE libraries and multiple drug design proxies demonstrate substantial reductions in synthesis cost, enhanced diversity (scaffolds/modes), and higher high-reward molecule discovery relative to prior template-based GFlowNets. The work delivers a scalable, cost-aware framework with publicly available code for advancing synthesis-aware molecular generation.

Abstract

Template-based molecular generation offers a promising avenue for drug design by ensuring generated compounds are synthetically accessible through predefined reaction templates and building blocks. In this work, we tackle three core challenges in template-based GFlowNets: (1) minimizing synthesis cost, (2) scaling to large building block libraries, and (3) effectively utilizing small fragment sets. We propose Recursive Cost Guidance, a backward policy framework that employs auxiliary machine learning models to approximate synthesis cost and viability. This guidance steers generation toward low-cost synthesis pathways, significantly enhancing cost-efficiency, molecular diversity, and quality, especially when paired with an Exploitation Penalty that balances the trade-off between exploration and exploitation. To enhance performance in smaller building block libraries, we develop a Dynamic Library mechanism that reuses intermediate high-reward states to construct full synthesis trees. Our approach establishes state-of-the-art results in template-based molecular generation.

Scalable and Cost-Efficient de Novo Template-Based Molecular Generation

TL;DR

SCENT addresses the synthesis bottleneck in template-based molecular generation by introducing Recursive Cost Guidance to steer backward transitions toward low-cost synthesis, complemented by Decomposability Guidance, an Exploitation Penalty, and a Dynamic Building Block Library. The backward policy is guided by a surrogate cost model , with specialized cost predictors and to reduce synthesis expense and enforce valid retrosynthesis. Dynamic Library augmentation expands reachable chemical space and improves credit assignment through trajectory compression, enabling full-tree synthesis. Empirical results across SMALL, MEDIUM, and LARGE libraries and multiple drug design proxies demonstrate substantial reductions in synthesis cost, enhanced diversity (scaffolds/modes), and higher high-reward molecule discovery relative to prior template-based GFlowNets. The work delivers a scalable, cost-aware framework with publicly available code for advancing synthesis-aware molecular generation.

Abstract

Template-based molecular generation offers a promising avenue for drug design by ensuring generated compounds are synthetically accessible through predefined reaction templates and building blocks. In this work, we tackle three core challenges in template-based GFlowNets: (1) minimizing synthesis cost, (2) scaling to large building block libraries, and (3) effectively utilizing small fragment sets. We propose Recursive Cost Guidance, a backward policy framework that employs auxiliary machine learning models to approximate synthesis cost and viability. This guidance steers generation toward low-cost synthesis pathways, significantly enhancing cost-efficiency, molecular diversity, and quality, especially when paired with an Exploitation Penalty that balances the trade-off between exploration and exploitation. To enhance performance in smaller building block libraries, we develop a Dynamic Library mechanism that reuses intermediate high-reward states to construct full synthesis trees. Our approach establishes state-of-the-art results in template-based molecular generation.

Paper Structure

This paper contains 59 sections, 19 equations, 17 figures, 12 tables, 1 algorithm.

Figures (17)

  • Figure 1: a) Recursive Cost Guidance employs machine learning models $\hat{c}_B^S$ and $\hat{c}_B^D$ to estimate the intractable synthesis cost and decomposability of precursor molecules, guiding the backward policy $P_B$ toward cheaper and viable intermediates. b) Forward policy is augmented with Exploitation Penalty to counter overexploitation induced by Synthesis Cost Guidance component of $P_B$. c) Dynamic Library gathers intermediate molecules with the highest expected reward and adds them to the building block library $M$, enabling full-tree synthesis.
  • Figure 2: Recursive Cost Guidance framework uses a cheap model $\hat{c}_B$ to approximate the intractable recursive cost $c_B$ of backward transitions.
  • Figure 3: Dynamic building block library is augmented using high-reward molecules (depicted in pink) that occur during sampling.
  • Figure 4: Synthesis Cost Guidance reduces trajectory length (a), fragments cost (b), and reliance on expensive fragments (d), while concentrating on a smaller fragment subset (c). Results are smoothed over the last 100 iterations.
  • Figure 5: Revisit frequency of high-reward scaffolds increases sharply with Synthesis Cost Guidance (C), indicating greater exploitative behavior. Introducing the Exploitation Penalty (P) effectively reduces this revisit ratio.
  • ...and 12 more figures