Table of Contents
Fetching ...

Tango*: Constrained synthesis planning using chemically informed value functions

Daniel Armstrong, Zlatko Joncev, Jeff Guo, Philippe Schwaller

TL;DR

The paper addresses constrained synthesis planning by introducing Tango*, a non-neural, computed node cost guiding a Retro*-based search toward specified starting materials. By balancing a TanSim/FMS-inspired TANGO cost with a hyperparameter $k$, Tango* achieves higher solve rates, lower expansion counts, and reduced wall-clock times compared with neural-guided Retro* and DESP baselines, and it remains effective when integrated into bidirectional DESP methods. The approach is validated on USPTO-190, Pistachio Reachable, and Pistachio Hard datasets, including a case study synthesizing Chlorambucil from renewable or waste feedstocks. The results suggest that chemically informed, non-neural guidance can rival or surpass specialised models in constrained retrosynthesis, with practical implications for waste valorisation and sustainable feedstocks.

Abstract

Computer-aided synthesis planning (CASP) has made significant strides in generating retrosynthetic pathways for simple molecules in a non-constrained fashion. Recent work introduces a specialised bidirectional search algorithm with forward and retro expansion to address the starting material-constrained synthesis problem, allowing CASP systems to provide synthesis pathways from specified starting materials, such as waste products or renewable feed-stocks. In this work, we introduce a simple guided search which allows solving the starting material-constrained synthesis planning problem using an existing, uni-directional search algorithm, Retro*. We show that by optimising a single hyperparameter, Tango* outperforms existing methods in terms of efficiency and solve rate. We find the Tango* cost function catalyses strong improvements for the bidirectional DESP methods. Our method also achieves lower wall clock times while proposing synthetic routes of similar length, a common metric for route quality. Finally, we highlight potential reasons for the strong performance of Tango over neural guided search methods

Tango*: Constrained synthesis planning using chemically informed value functions

TL;DR

The paper addresses constrained synthesis planning by introducing Tango*, a non-neural, computed node cost guiding a Retro*-based search toward specified starting materials. By balancing a TanSim/FMS-inspired TANGO cost with a hyperparameter , Tango* achieves higher solve rates, lower expansion counts, and reduced wall-clock times compared with neural-guided Retro* and DESP baselines, and it remains effective when integrated into bidirectional DESP methods. The approach is validated on USPTO-190, Pistachio Reachable, and Pistachio Hard datasets, including a case study synthesizing Chlorambucil from renewable or waste feedstocks. The results suggest that chemically informed, non-neural guidance can rival or surpass specialised models in constrained retrosynthesis, with practical implications for waste valorisation and sustainable feedstocks.

Abstract

Computer-aided synthesis planning (CASP) has made significant strides in generating retrosynthetic pathways for simple molecules in a non-constrained fashion. Recent work introduces a specialised bidirectional search algorithm with forward and retro expansion to address the starting material-constrained synthesis problem, allowing CASP systems to provide synthesis pathways from specified starting materials, such as waste products or renewable feed-stocks. In this work, we introduce a simple guided search which allows solving the starting material-constrained synthesis planning problem using an existing, uni-directional search algorithm, Retro*. We show that by optimising a single hyperparameter, Tango* outperforms existing methods in terms of efficiency and solve rate. We find the Tango* cost function catalyses strong improvements for the bidirectional DESP methods. Our method also achieves lower wall clock times while proposing synthetic routes of similar length, a common metric for route quality. Finally, we highlight potential reasons for the strong performance of Tango over neural guided search methods

Paper Structure

This paper contains 11 sections, 8 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: Comparison of existing constrained synthesis planning methods with Tango*
  • Figure 2: Here we demonstrate a meaningful 12-step route generated by our method on a (target, starting material) pair not solved by the best performing DESP yu2024double method. Constrained starting material highlighted in red; bonds/atoms disconnected shown in red.
  • Figure 3: A comparison of node cost estimates for USPTO-190 routes solved and not solved by Retro* search using the corresponding cost function.(a) Tango Cost for routes solved by Tango*, (b) SynDist Cost for routes solved by Retro* + D, (c) Retro* cost function estimates for routes solved by Retro*, (d) Tango Cost for routes not solved by Tango*, (e) SynDist cost for routes not solved by Retro* + D and (f) Retro* cost for routes not solved by Retro*
  • Figure 4: Here we demonstrate a feasible 10-step route generated by Tango-DESP-F2F on a (target, starting material) pair not solved by the neural guided DESP-F2F method. Constrained starting material is highlighted in red; bonds/atoms disconnected are shown in red.
  • Figure 5: Here we show a feasible synthesis route to the chemotherapy drug, Chlorambucil, a WHO essential medicine, synthesised entirely from renewable or industrial waste feedstocks.
  • ...and 2 more figures