Table of Contents
Fetching ...

Exploring Synthesizable Chemical Space with Iterative Pathway Refinements

Seul Lee, Karsten Kreis, Srimukh Prasad Veccham, Meng Liu, Danny Reidenbach, Saee Paliwal, Weili Nie, Arash Vahdat

TL;DR

This work tackles the common pitfall of molecular generative models producing unsynthesizable candidates. It introduces ReaSyn, a bidirectional, iterative framework that generates and refines synthetic pathways (BU and TD) and halos them with Edit Bridge to holistically edit entire pathways, thereby robustly projecting input molecules into synthesizable space. Key contributions include a simple SMILES-based, non-hierarchical pathway representation, a unified autoregressive model capable of BU/TD sampling, and a discrete-flow Edit Bridge that couples generated pathways to data distributions. Across tasks—synthesizable molecule reconstruction, goal-directed optimization, and hit expansion—ReaSyn achieves state-of-the-art reconstruction and optimization performance, substantially improving coverage and diversity in the synthesizable chemical space. The approach holds practical impact for drug discovery by enabling reliable synthesis-aware design and providing public code/models for broader adoption.

Abstract

A well-known pitfall of molecular generative models is that they are not guaranteed to generate synthesizable molecules. Existing solutions for this problem often struggle to effectively navigate exponentially large combinatorial space of synthesizable molecules and suffer from poor coverage. To address this problem, we introduce ReaSyn, an iterative generative pathway refinement framework that obtains synthesizable analogs to input molecules by projecting them onto synthesizable space. Specifically, we propose a simple synthetic pathway representation that allows for generating pathways in both bottom-up and top-down traversal of synthetic trees. We design ReaSyn so that both bottom-up and top-down pathways can be sampled with a single unified autoregressive model. ReaSyn can thus iteratively refine subtrees of generated synthetic trees in a bidirectional manner. Further, we introduce a discrete flow model that refines the generated pathway at the entire pathway level with edit operations: insertion, deletion, and substitution. The iterative refinement cycle of (1) bottom-up decoding, (2) top-down decoding, and (3) holistic editing constitutes a powerful pathway reasoning strategy, allowing the model to explore the vast space of synthesizable molecules. Experimentally, ReaSyn achieves the highest reconstruction rate and pathway diversity in synthesizable molecule reconstruction and the highest optimization performance in synthesizable goal-directed molecular optimization, and significantly outperforms previous synthesizable projection methods in synthesizable hit expansion. These results highlight ReaSyn's superior ability to navigate combinatorially-large synthesizable chemical space.

Exploring Synthesizable Chemical Space with Iterative Pathway Refinements

TL;DR

This work tackles the common pitfall of molecular generative models producing unsynthesizable candidates. It introduces ReaSyn, a bidirectional, iterative framework that generates and refines synthetic pathways (BU and TD) and halos them with Edit Bridge to holistically edit entire pathways, thereby robustly projecting input molecules into synthesizable space. Key contributions include a simple SMILES-based, non-hierarchical pathway representation, a unified autoregressive model capable of BU/TD sampling, and a discrete-flow Edit Bridge that couples generated pathways to data distributions. Across tasks—synthesizable molecule reconstruction, goal-directed optimization, and hit expansion—ReaSyn achieves state-of-the-art reconstruction and optimization performance, substantially improving coverage and diversity in the synthesizable chemical space. The approach holds practical impact for drug discovery by enabling reliable synthesis-aware design and providing public code/models for broader adoption.

Abstract

A well-known pitfall of molecular generative models is that they are not guaranteed to generate synthesizable molecules. Existing solutions for this problem often struggle to effectively navigate exponentially large combinatorial space of synthesizable molecules and suffer from poor coverage. To address this problem, we introduce ReaSyn, an iterative generative pathway refinement framework that obtains synthesizable analogs to input molecules by projecting them onto synthesizable space. Specifically, we propose a simple synthetic pathway representation that allows for generating pathways in both bottom-up and top-down traversal of synthetic trees. We design ReaSyn so that both bottom-up and top-down pathways can be sampled with a single unified autoregressive model. ReaSyn can thus iteratively refine subtrees of generated synthetic trees in a bidirectional manner. Further, we introduce a discrete flow model that refines the generated pathway at the entire pathway level with edit operations: insertion, deletion, and substitution. The iterative refinement cycle of (1) bottom-up decoding, (2) top-down decoding, and (3) holistic editing constitutes a powerful pathway reasoning strategy, allowing the model to explore the vast space of synthesizable molecules. Experimentally, ReaSyn achieves the highest reconstruction rate and pathway diversity in synthesizable molecule reconstruction and the highest optimization performance in synthesizable goal-directed molecular optimization, and significantly outperforms previous synthesizable projection methods in synthesizable hit expansion. These results highlight ReaSyn's superior ability to navigate combinatorially-large synthesizable chemical space.

Paper Structure

This paper contains 45 sections, 7 equations, 11 figures, 9 tables, 1 algorithm.

Figures (11)

  • Figure 1: Synthesizable molecule reconstruction results on ZINC250k. Full results are provided in Table \ref{['tab:reconstruction']}.
  • Figure 2: (a) Bottom-up and top-down traversal of a synthetic tree.(b) Overall framework of ReaSyn. ReaSyn's generation cycle consists of three steps. First, an initial synthetic tree is generated by the autoregressive model in a bottom-up direction. Next, the autoregressive model repredicts a randomly selected subtree in a top-down direction. Finally, the Edit Flow model refines the generated tree in a holistic manner. This process can be repeated multiple times, and the best pathway that yields a product molecule of the highest similarity to the given target molecule is selected as the final solution. The sampling processes of the autoregressive model (the first and the second steps) and the Edit Bridge model are depicted in Figure \ref{['fig:butd']}(a) and Figure \ref{['fig:butd']}(b), respectively.
  • Figure 3: ReaSyn adopts an encoder-decoder Transformer architecture. After the encoder encodes the input molecule, the decoder predicts the synthetic pathways of its synthesizable analogs. $\texttt{[START]}$ and $\texttt{[END]}$ tokens are omitted for simplicity. (a) Bidirectional synthetic pathway generation of ReaSyn. ReaSyn's autoregressive model predicts the synthetic pathways in the sequential representation. ReaSyn's training and inference scheme tailored for the bidirectional synthetic pathway generation enables to designate a specific sampling direction using a single autoregressive model. (b) Holistic pathway editing of ReaSyn. ReaSyn's Edit Bridge model takes the full pathway generated by the autoregressive model and jointly edits the tree skeleton and semantics.
  • Figure 4: Ablation study on synthesizable molecule reconstruction.
  • Figure 4: Synthesizable goal-directed molecular optimization results on the sEH proxy. The results are the means and the standard deviations of 3 runs. The results for the baselines are taken from cretu2024synflownet. The best results are highlighted in bold.
  • ...and 6 more figures