Table of Contents
Fetching ...

Double-Ended Synthesis Planning with Goal-Constrained Bidirectional Search

Kevin Yu, Jihye Roh, Ziang Li, Wenhao Gao, Runzhong Wang, Connor W. Coley

TL;DR

This work proposes Double-Ended Synthesis Planning (DESP), a novel CASP algorithm under a bidirectional graph search scheme that interleaves expansions from the target and from the goal starting materials to ensure constraint satisfiability.

Abstract

Computer-aided synthesis planning (CASP) algorithms have demonstrated expert-level abilities in planning retrosynthetic routes to molecules of low to moderate complexity. However, current search methods assume the sufficiency of reaching arbitrary building blocks, failing to address the common real-world constraint where using specific molecules is desired. To this end, we present a formulation of synthesis planning with starting material constraints. Under this formulation, we propose Double-Ended Synthesis Planning (DESP), a novel CASP algorithm under a bidirectional graph search scheme that interleaves expansions from the target and from the goal starting materials to ensure constraint satisfiability. The search algorithm is guided by a goal-conditioned cost network learned offline from a partially observed hypergraph of valid chemical reactions. We demonstrate the utility of DESP in improving solve rates and reducing the number of search expansions by biasing synthesis planning towards expert goals on multiple new benchmarks. DESP can make use of existing one-step retrosynthesis models, and we anticipate its performance to scale as these one-step model capabilities improve.

Double-Ended Synthesis Planning with Goal-Constrained Bidirectional Search

TL;DR

This work proposes Double-Ended Synthesis Planning (DESP), a novel CASP algorithm under a bidirectional graph search scheme that interleaves expansions from the target and from the goal starting materials to ensure constraint satisfiability.

Abstract

Computer-aided synthesis planning (CASP) algorithms have demonstrated expert-level abilities in planning retrosynthetic routes to molecules of low to moderate complexity. However, current search methods assume the sufficiency of reaching arbitrary building blocks, failing to address the common real-world constraint where using specific molecules is desired. To this end, we present a formulation of synthesis planning with starting material constraints. Under this formulation, we propose Double-Ended Synthesis Planning (DESP), a novel CASP algorithm under a bidirectional graph search scheme that interleaves expansions from the target and from the goal starting materials to ensure constraint satisfiability. The search algorithm is guided by a goal-conditioned cost network learned offline from a partially observed hypergraph of valid chemical reactions. We demonstrate the utility of DESP in improving solve rates and reducing the number of search expansions by biasing synthesis planning towards expert goals on multiple new benchmarks. DESP can make use of existing one-step retrosynthesis models, and we anticipate its performance to scale as these one-step model capabilities improve.
Paper Structure (51 sections, 15 equations, 9 figures, 5 tables, 3 algorithms)

This paper contains 51 sections, 15 equations, 9 figures, 5 tables, 3 algorithms.

Figures (9)

  • Figure 1: (a) Existing search methods are single-ended, and aim to identify a synthetic route where all leaf nodes meet certain termination criteria, e.g., buyability. (b)DESP is a bidirectional search algorithm that enables a double-ended starting material-constrained search, better reflecting certain real-world use cases in complex molecule synthesis planning. Empirically, double-ended search finds starting material-constrained solutions with fewer node expansions.
  • Figure 2: (a)DESP algorithm. Evaluation of top nodes is based on both $V_m$ and $D_m$. For F2E search, synthetic distance is calculated between a molecule and the opposing goal, while for F2F, it is calculated based on the closest opposing molecule. (b) Overview of the one-step expansion procedures.
  • Figure 3: Ablation study. (a) Solve rate as a function of the binned complexity of target molecules in Pistachio Hard. (b) Number of forward reactions in DESP routes across all benchmark sets.
  • Figure 4: Exemplary synthetic route for a test case that DESP-F2F was able to solve but Retro* was unable to solve. DESP-F2F was able to match every step of the reference route in this case.
  • Figure 5: Illustration of data extraction procedure for offline training of $f_t$, $f_b$, and $D$. (a) For each target, the full search graph is enumerated by recursively tracing outgoing edges and propagating Retro* quantities. (b) For each bimolecular reaction with at least one buyable reactant, training examples for $f_t$ and $f_b$ are labeled. For each molecule node $m$ other than the target, $D(m, p^*)$ is computed.
  • ...and 4 more figures