Table of Contents
Fetching ...

Retro-fallback: retrosynthetic planning in an uncertain world

Austin Tripp, Krzysztof Maziarz, Sarah Lewis, Marwin Segler, José Miguel Hernández-Lobato

TL;DR

The paper tackles retrosynthesis under uncertainty by framing reaction feasibility and buyability as binary stochastic processes and introducing the SSP metric to quantify the probability that at least one plan works. It then presents retro-fallback, a greedy algorithm that explicitly maximizes SSP by leveraging sampled realizations and a recursive, DP-enabled estimation of success likelihood. Across USPTO, GuacaMol, and FusionRetro benchmarks, retro-fallback outperforms baseline methods in SSP, illustrating the value of planning with backup options under uncertainty. The work highlights practical trade-offs, including slower runtime and the need for accurate feasibility models, and outlines directions for improving realism and scalability in lab-ready retrosynthetic planning.

Abstract

Retrosynthesis is the task of planning a series of chemical reactions to create a desired molecule from simpler, buyable molecules. While previous works have proposed algorithms to find optimal solutions for a range of metrics (e.g. shortest, lowest-cost), these works generally overlook the fact that we have imperfect knowledge of the space of possible reactions, meaning plans created by algorithms may not work in a laboratory. In this paper we propose a novel formulation of retrosynthesis in terms of stochastic processes to account for this uncertainty. We then propose a novel greedy algorithm called retro-fallback which maximizes the probability that at least one synthesis plan can be executed in the lab. Using in-silico benchmarks we demonstrate that retro-fallback generally produces better sets of synthesis plans than the popular MCTS and retro* algorithms.

Retro-fallback: retrosynthetic planning in an uncertain world

TL;DR

The paper tackles retrosynthesis under uncertainty by framing reaction feasibility and buyability as binary stochastic processes and introducing the SSP metric to quantify the probability that at least one plan works. It then presents retro-fallback, a greedy algorithm that explicitly maximizes SSP by leveraging sampled realizations and a recursive, DP-enabled estimation of success likelihood. Across USPTO, GuacaMol, and FusionRetro benchmarks, retro-fallback outperforms baseline methods in SSP, illustrating the value of planning with backup options under uncertainty. The work highlights practical trade-offs, including slower runtime and the need for accurate feasibility models, and outlines directions for improving realism and scalability in lab-ready retrosynthetic planning.

Abstract

Retrosynthesis is the task of planning a series of chemical reactions to create a desired molecule from simpler, buyable molecules. While previous works have proposed algorithms to find optimal solutions for a range of metrics (e.g. shortest, lowest-cost), these works generally overlook the fact that we have imperfect knowledge of the space of possible reactions, meaning plans created by algorithms may not work in a laboratory. In this paper we propose a novel formulation of retrosynthesis in terms of stochastic processes to account for this uncertainty. We then propose a novel greedy algorithm called retro-fallback which maximizes the probability that at least one synthesis plan can be executed in the lab. Using in-silico benchmarks we demonstrate that retro-fallback generally produces better sets of synthesis plans than the popular MCTS and retro* algorithms.
Paper Structure (115 sections, 9 theorems, 36 equations, 19 figures, 1 table, 2 algorithms)

This paper contains 115 sections, 9 theorems, 36 equations, 19 figures, 1 table, 2 algorithms.

Key Result

Theorem 3.1

Unless $P=NP$, there does not exist an algorithm to compute $\mathrm{SSP}(\mathcal{P}_{m_\star}({\mathcal{G}}') ; \xi_f, \xi_b)$ for arbitrary $\xi_f,\xi_b$ whose time complexity grows polynomially with the number of nodes in ${\mathcal{G}}'$.

Figures (19)

  • Figure 1: a) graph ${\mathcal{G}}'$ with (backward) reactions $m_\star\Rightarrow m_a+m_b\,(r_1)$, $m_\star\Rightarrow m_b+m_c+m_d\,(r_2)$, and $m_a\Rightarrow m_e\,(r_3)$. Dashed box illustrates expansion of $m_c$. b) All synthesis plans in $\mathcal{P}_{m_\star}({\mathcal{G}}')$.
  • Figure 2: Mean SSP across all 190 test molecules vs. time using the SA score heuristic. 3 trials are done for each molecule. Solid lines are sample means (averaged across molecules), and error bars represent standard errors. "ind." means "independent".
  • Figure B.1: Graph and tree representation of the reaction set $m_\star\Rightarrow m_a$ ($r_1$), $m_a\Rightarrow m_b$ ($r_2$), $m_\star\Rightarrow m_c$ ($r_3$), and $m_c\Rightarrow m_\star$ ($r_4$).
  • Figure B.2: OR graph for same set of reactions as Figure \ref{['fig:graphs-vs-trees-schematic']}.
  • Figure C.1: A search graph ${\mathcal{G}}'$ with values for ${\textnormal{s}}$, $\psi$, and $\rho$ worked out. A detailed explanation is given in Appendix \ref{['appendix:psi-rho-example']}.
  • ...and 14 more figures

Theorems & Definitions (16)

  • Theorem 3.1
  • Corollary C.2
  • proof
  • Lemma C.3
  • proof
  • Proposition C.4
  • proof
  • Proposition C.5
  • proof
  • Theorem D.1
  • ...and 6 more