Tango*: Constrained synthesis planning using chemically informed value functions
Daniel Armstrong, Zlatko Joncev, Jeff Guo, Philippe Schwaller
TL;DR
The paper addresses constrained synthesis planning by introducing Tango*, a non-neural, computed node cost guiding a Retro*-based search toward specified starting materials. By balancing a TanSim/FMS-inspired TANGO cost with a hyperparameter $k$, Tango* achieves higher solve rates, lower expansion counts, and reduced wall-clock times compared with neural-guided Retro* and DESP baselines, and it remains effective when integrated into bidirectional DESP methods. The approach is validated on USPTO-190, Pistachio Reachable, and Pistachio Hard datasets, including a case study synthesizing Chlorambucil from renewable or waste feedstocks. The results suggest that chemically informed, non-neural guidance can rival or surpass specialised models in constrained retrosynthesis, with practical implications for waste valorisation and sustainable feedstocks.
Abstract
Computer-aided synthesis planning (CASP) has made significant strides in generating retrosynthetic pathways for simple molecules in a non-constrained fashion. Recent work introduces a specialised bidirectional search algorithm with forward and retro expansion to address the starting material-constrained synthesis problem, allowing CASP systems to provide synthesis pathways from specified starting materials, such as waste products or renewable feed-stocks. In this work, we introduce a simple guided search which allows solving the starting material-constrained synthesis planning problem using an existing, uni-directional search algorithm, Retro*. We show that by optimising a single hyperparameter, Tango* outperforms existing methods in terms of efficiency and solve rate. We find the Tango* cost function catalyses strong improvements for the bidirectional DESP methods. Our method also achieves lower wall clock times while proposing synthetic routes of similar length, a common metric for route quality. Finally, we highlight potential reasons for the strong performance of Tango over neural guided search methods
