Table of Contents
Fetching ...

RETRO SYNFLOW: Discrete Flow Matching for Accurate and Diverse Single-Step Retrosynthesis

Robin Yadav, Qi Yan, Guy Wolf, Avishek Joey Bose, Renjie Liao

Abstract

A fundamental problem in organic chemistry is identifying and predicting the series of reactions that synthesize a desired target product molecule. Due to the combinatorial nature of the chemical search space, single-step reactant prediction -- i.e. single-step retrosynthesis -- remains challenging even for existing state-of-the-art template-free generative approaches to produce an accurate yet diverse set of feasible reactions. In this paper, we model single-step retrosynthesis planning and introduce RETRO SYNFLOW (RSF) a discrete flow-matching framework that builds a Markov bridge between the prescribed target product molecule and the reactant molecule. In contrast to past approaches, RSF employs a reaction center identification step to produce intermediate structures known as synthons as a more informative source distribution for the discrete flow. To further enhance diversity and feasibility of generated samples, we employ Feynman-Kac steering with Sequential Monte Carlo based resampling to steer promising generations at inference using a new reward oracle that relies on a forward-synthesis model. Empirically, we demonstrate \nameshort achieves $60.0 \%$ top-1 accuracy, which outperforms the previous SOTA by $20 \%$. We also substantiate the benefits of steering at inference and demonstrate that FK-steering improves top-$5$ round-trip accuracy by $19 \%$ over prior template-free SOTA methods, all while preserving competitive top-$k$ accuracy results.

RETRO SYNFLOW: Discrete Flow Matching for Accurate and Diverse Single-Step Retrosynthesis

Abstract

A fundamental problem in organic chemistry is identifying and predicting the series of reactions that synthesize a desired target product molecule. Due to the combinatorial nature of the chemical search space, single-step reactant prediction -- i.e. single-step retrosynthesis -- remains challenging even for existing state-of-the-art template-free generative approaches to produce an accurate yet diverse set of feasible reactions. In this paper, we model single-step retrosynthesis planning and introduce RETRO SYNFLOW (RSF) a discrete flow-matching framework that builds a Markov bridge between the prescribed target product molecule and the reactant molecule. In contrast to past approaches, RSF employs a reaction center identification step to produce intermediate structures known as synthons as a more informative source distribution for the discrete flow. To further enhance diversity and feasibility of generated samples, we employ Feynman-Kac steering with Sequential Monte Carlo based resampling to steer promising generations at inference using a new reward oracle that relies on a forward-synthesis model. Empirically, we demonstrate \nameshort achieves top-1 accuracy, which outperforms the previous SOTA by . We also substantiate the benefits of steering at inference and demonstrate that FK-steering improves top- round-trip accuracy by over prior template-free SOTA methods, all while preserving competitive top- accuracy results.

Paper Structure

This paper contains 25 sections, 10 equations, 12 figures, 9 tables.

Figures (12)

  • Figure 1: An overview of our Retro ProdFlow (RPF) and Retro SynFlow (RSF) framework. RPF directly maps a product molecule to reactants via discrete flow. RSF first predicts synthons from the product using a reaction center predictor, then maps these synthons to reactants via discrete flow.
  • Figure 2: Inference time steering with a forward-synthesis reward model.
  • Figure 3: Overview of flow matching denoiser $p_\theta$.
  • Figure 4: Top-$5$ reactants selected by each method. A star indicates an exact match, a checkmark indicates a round-trip match but not an exact match, and a cross means neither.
  • Figure 5: Performance of Retro ProdFlow-RS on the USPTO-50k validation set as we vary the number of particles for SMC resampling. We sample $N = 50$ reactants per product.
  • ...and 7 more figures