Table of Contents
Fetching ...

FragmentRetro: A Quadratic Retrosynthetic Method Based on Fragmentation Algorithms

Yu Shee, Anthony M. Smaldone, Anton Morgunov, Gregory W. Kyro, Victor S. Batista

TL;DR

FragmentRetro tackles the exponential bottleneck of tree-search retrosynthesis by introducing a bottom-up, fragment-based approach that employs BRICS and r-BRICS fragmentation, stock-aware fragment combinations, and fingerprint-assisted pruning to identify viable precursor sets with complexity $O(h^2)$. It delivers sets of reconstructive fragments rather than explicit reaction DAGs, and offers a formal complexity comparison showing $O(b^h)$ for tree-search and $O(h^6)$ for DirectMultiStep, in contrast to FragmentRetro’s quadratic scaling, albeit with a linear dependence on stock size. Empirically, FragmentRetro achieves competitive solved rates on benchmarks like PaRoutes and USPTO-190 and benefits from substantial parallelization in substructure screening, enabling scalable searches on large BB inventories. The method provides a scalable, DAG-agnostic foundation for retrosynthetic planning that can be integrated with downstream DAG construction and cost-aware extensions in a tiered synthesis-planning pipeline.

Abstract

Retrosynthesis, the process of deconstructing a target molecule into simpler precursors, is crucial for computer-aided synthesis planning (CASP). Widely adopted tree-search methods often suffer from exponential computational complexity. In this work, we introduce FragmentRetro, a novel retrosynthetic method that leverages fragmentation algorithms, specifically BRICS and r-BRICS, combined with stock-aware exploration and pattern fingerprint screening to achieve quadratic complexity. FragmentRetro recursively combines molecular fragments and verifies their presence in a building block set, providing sets of fragment combinations as retrosynthetic solutions. We present the first formal computational analysis of retrosynthetic methods, showing that tree search exhibits exponential complexity $O(b^h)$, DirectMultiStep scales as $O(h^6)$, and FragmentRetro achieves $O(h^2)$, where $h$ represents the number of heavy atoms in the target molecule and $b$ is the branching factor for tree search. Evaluations on PaRoutes, USPTO-190, and natural products demonstrate that FragmentRetro achieves high solved rates with competitive runtime, including cases where tree search fails. The method benefits from fingerprint screening, which significantly reduces substructure matching complexity. While FragmentRetro focuses on efficiently identifying fragment-based solutions rather than full reaction pathways, its computational advantages and ability to generate strategic starting candidates establish it as a powerful foundational component for scalable and automated synthesis planning.

FragmentRetro: A Quadratic Retrosynthetic Method Based on Fragmentation Algorithms

TL;DR

FragmentRetro tackles the exponential bottleneck of tree-search retrosynthesis by introducing a bottom-up, fragment-based approach that employs BRICS and r-BRICS fragmentation, stock-aware fragment combinations, and fingerprint-assisted pruning to identify viable precursor sets with complexity . It delivers sets of reconstructive fragments rather than explicit reaction DAGs, and offers a formal complexity comparison showing for tree-search and for DirectMultiStep, in contrast to FragmentRetro’s quadratic scaling, albeit with a linear dependence on stock size. Empirically, FragmentRetro achieves competitive solved rates on benchmarks like PaRoutes and USPTO-190 and benefits from substantial parallelization in substructure screening, enabling scalable searches on large BB inventories. The method provides a scalable, DAG-agnostic foundation for retrosynthetic planning that can be integrated with downstream DAG construction and cost-aware extensions in a tiered synthesis-planning pipeline.

Abstract

Retrosynthesis, the process of deconstructing a target molecule into simpler precursors, is crucial for computer-aided synthesis planning (CASP). Widely adopted tree-search methods often suffer from exponential computational complexity. In this work, we introduce FragmentRetro, a novel retrosynthetic method that leverages fragmentation algorithms, specifically BRICS and r-BRICS, combined with stock-aware exploration and pattern fingerprint screening to achieve quadratic complexity. FragmentRetro recursively combines molecular fragments and verifies their presence in a building block set, providing sets of fragment combinations as retrosynthetic solutions. We present the first formal computational analysis of retrosynthetic methods, showing that tree search exhibits exponential complexity , DirectMultiStep scales as , and FragmentRetro achieves , where represents the number of heavy atoms in the target molecule and is the branching factor for tree search. Evaluations on PaRoutes, USPTO-190, and natural products demonstrate that FragmentRetro achieves high solved rates with competitive runtime, including cases where tree search fails. The method benefits from fingerprint screening, which significantly reduces substructure matching complexity. While FragmentRetro focuses on efficiently identifying fragment-based solutions rather than full reaction pathways, its computational advantages and ability to generate strategic starting candidates establish it as a powerful foundational component for scalable and automated synthesis planning.

Paper Structure

This paper contains 22 sections, 3 figures, 3 tables, 1 algorithm.

Figures (3)

  • Figure 1: The FragmentRetro algorithm. (a) Cartoon representation of an example molecule processed by BRICS or r-BRICS to yield initial fragments labeled A to F. (b) The FragmentRetro process: In Stage 1, all initial fragments have substructure matches in the stock set. In Stage 2, fragments A–B, B–C, and E–F are valid. In Stage 3, only fragment A–B–C remains valid. Fragments like A–B–D do not need to be checked, since B–D is invalid and therefore A–B–D cannot have a substructure match. There is no Stage 4, as no valid combinations of four fragments are possible in this case. (c) Possible solutions are sorted by the number of fragments, with the most efficient solution on the left. (d) Each valid fragment is associated with a subset of the stock that has substructure matches.
  • Figure 2: FragmentRetro evaluation on Narlaprevir, Martinellic Acid, and Lennoxamine. The fragments from BRICS and r-BRICS are highlighted. One solution for each compound from FragmentRetro is shown. The highlighted fragments remain highlighted in both the solutions and corresponding building blocks, even when some BRICS or r-BRICS cleavage sites are not fragmented in the solutions.
  • Figure 3: Additional evaluation of target compounds using FragmentRetro. This figure adopts the format and compound set from Extended Data Fig. 5 of the Higher-Level Retrosynthesis paper higherlev_2025 to facilitate direct comparison. The smallest box contains targets solved by the "Original" tree-search algorithm (as defined in higherlev_2025); the medium box includes additional compounds solved by the Higher-Level strategy. Compounds outside both boxes were not solved by either method. Compounds successfully solved by BRICS + FragmentRetro and r-BRICS + FragmentRetro are marked with green (BRICS+FR) and blue (r-BRICS+FR) tags, respectively.