Table of Contents
Fetching ...

Generative Flows on Synthetic Pathway for Drug Design

Seonghwan Seo, Minsu Kim, Tony Shen, Martin Ester, Jinkyoo Park, Sungsoo Ahn, Woo Youn Kim

TL;DR

RxnFlow addresses the challenge of synthesizability in structure-based drug design by leveraging a synthesis-oriented Generative Flow Network (GFlowNet) that builds molecules from Enamine-style building blocks via predefined reaction templates. It introduces action-space subsampling to manage massive spaces (over $10^6$ blocks and 71 templates) and a non-hierarchical MDP that jointly selects templates and blocks, enabling robust policy estimation and compatibility with expanding libraries. The approach achieves state-of-the-art performance on CrossDocked2020 pocket-conditional generation with an average Vina score of $-8.85$ kcal/mol and $34.8\%$ synthesizability, and demonstrates strong pocket-specific optimization with GPU-accelerated docking, while allowing new objectives to be added without retraining. Overall, RxnFlow provides a practical, adaptable pipeline for synthesis-constrained drug design that balances potency, diversity, and synthetic feasibility, with scalable building-block incorporation via action embedding and a principled GFlowNet training objective $p(x) \propto R(x)$.

Abstract

Generative models in drug discovery have recently gained attention as efficient alternatives to brute-force virtual screening. However, most existing models do not account for synthesizability, limiting their practical use in real-world scenarios. In this paper, we propose RxnFlow, which sequentially assembles molecules using predefined molecular building blocks and chemical reaction templates to constrain the synthetic chemical pathway. We then train on this sequential generating process with the objective of generative flow networks (GFlowNets) to generate both highly rewarded and diverse molecules. To mitigate the large action space of synthetic pathways in GFlowNets, we implement a novel action space subsampling method. This enables RxnFlow to learn generative flows over extensive action spaces comprising combinations of 1.2 million building blocks and 71 reaction templates without significant computational overhead. Additionally, RxnFlow can employ modified or expanded action spaces for generation without retraining, allowing for the introduction of additional objectives or the incorporation of newly discovered building blocks. We experimentally demonstrate that RxnFlow outperforms existing reaction-based and fragment-based models in pocket-specific optimization across various target pockets. Furthermore, RxnFlow achieves state-of-the-art performance on CrossDocked2020 for pocket-conditional generation, with an average Vina score of -8.85 kcal/mol and 34.8% synthesizability.

Generative Flows on Synthetic Pathway for Drug Design

TL;DR

RxnFlow addresses the challenge of synthesizability in structure-based drug design by leveraging a synthesis-oriented Generative Flow Network (GFlowNet) that builds molecules from Enamine-style building blocks via predefined reaction templates. It introduces action-space subsampling to manage massive spaces (over blocks and 71 templates) and a non-hierarchical MDP that jointly selects templates and blocks, enabling robust policy estimation and compatibility with expanding libraries. The approach achieves state-of-the-art performance on CrossDocked2020 pocket-conditional generation with an average Vina score of kcal/mol and synthesizability, and demonstrates strong pocket-specific optimization with GPU-accelerated docking, while allowing new objectives to be added without retraining. Overall, RxnFlow provides a practical, adaptable pipeline for synthesis-constrained drug design that balances potency, diversity, and synthetic feasibility, with scalable building-block incorporation via action embedding and a principled GFlowNet training objective .

Abstract

Generative models in drug discovery have recently gained attention as efficient alternatives to brute-force virtual screening. However, most existing models do not account for synthesizability, limiting their practical use in real-world scenarios. In this paper, we propose RxnFlow, which sequentially assembles molecules using predefined molecular building blocks and chemical reaction templates to constrain the synthetic chemical pathway. We then train on this sequential generating process with the objective of generative flow networks (GFlowNets) to generate both highly rewarded and diverse molecules. To mitigate the large action space of synthetic pathways in GFlowNets, we implement a novel action space subsampling method. This enables RxnFlow to learn generative flows over extensive action spaces comprising combinations of 1.2 million building blocks and 71 reaction templates without significant computational overhead. Additionally, RxnFlow can employ modified or expanded action spaces for generation without retraining, allowing for the introduction of additional objectives or the incorporation of newly discovered building blocks. We experimentally demonstrate that RxnFlow outperforms existing reaction-based and fragment-based models in pocket-specific optimization across various target pockets. Furthermore, RxnFlow achieves state-of-the-art performance on CrossDocked2020 for pocket-conditional generation, with an average Vina score of -8.85 kcal/mol and 34.8% synthesizability.
Paper Structure (52 sections, 38 equations, 17 figures, 14 tables, 1 algorithm)

This paper contains 52 sections, 38 equations, 17 figures, 14 tables, 1 algorithm.

Figures (17)

  • Figure 1: Overview of RxnFlow.(a) Synthetic action space which is represented in a continuous action space. Each colored box corresponds to a reaction template and the molecules in the box are reactant blocks. (b) Policy estimation using the action space subsampling in a manner of importance sampling. (c) Molecular generation process and model training.
  • Figure 2: Comparison of using modified building block library for the generation: (a) a hierarchical MDP, and (b) a non-hierarchical MDP. More details are in \ref{['fig-appendix: non-hierarchical']}.
  • Figure 3: Visualization of generated molecules in a zero-shot manner. (a-b) Docking results of generated molecules and known reference ligands of TBK1 (PDB Id: 1FV, SU6). (c) Generative trajectory, which is the generated synthetic pathway of the left molecule in (a).
  • Figure 4: Property distribution of sampled molecules with "all" building blocks and "low"-TPSA building blocks. Vina score was calculated against the KRAS-G12C target.
  • Figure 5: QED reward distribution of generated molecules for each of the "seen", "unseen", and "all" blocks. Additional results are in \ref{['fig-appendix: result-scaling']}.
  • ...and 12 more figures