Re-evaluating Retrosynthesis Algorithms with Syntheseus

Krzysztof Maziarz; Austin Tripp; Guoqing Liu; Megan Stanley; Shufang Xie; Piotr Gaiński; Philipp Seidl; Marwin Segler

Re-evaluating Retrosynthesis Algorithms with Syntheseus

Krzysztof Maziarz, Austin Tripp, Guoqing Liu, Megan Stanley, Shufang Xie, Piotr Gaiński, Philipp Seidl, Marwin Segler

TL;DR

This work presents a synthesis planning library with an extensive benchmarking framework, called SYNTHESEUS, which promotes best practice by default, enabling consistent meaningful evaluation of single-step and multi-step synthesis planning algorithms.

Abstract

Automated Synthesis Planning has recently re-emerged as a research area at the intersection of chemistry and machine learning. Despite the appearance of steady progress, we argue that imperfect benchmarks and inconsistent comparisons mask systematic shortcomings of existing techniques, and unnecessarily hamper progress. To remedy this, we present a synthesis planning library with an extensive benchmarking framework, called syntheseus, which promotes best practice by default, enabling consistent meaningful evaluation of single-step models and multi-step planning algorithms. We demonstrate the capabilities of syntheseus by re-evaluating several previous retrosynthesis algorithms, and find that the ranking of state-of-the-art models changes in controlled evaluation experiments. We end with guidance for future works in this area, and call the community to engage in the discussion on how to improve benchmarks for synthesis planning.

Re-evaluating Retrosynthesis Algorithms with Syntheseus

TL;DR

Abstract

Paper Structure (27 sections, 5 figures, 2 tables)

This paper contains 27 sections, 5 figures, 2 tables.

Introduction
Prior Work on Benchmarking
Pitfalls and Best Practice for Retrosynthesis Evaluation
Single-Step Models
Multi-Step Search
Syntheseus
Unrestricted single-step model development
Separation of components in multi-step search
Detailed metrics for multi-step search
Experiments: re-evaluation of existing methods
Single-Step
Datasets
Models
Metrics
Setup
...and 12 more sections

Figures (5)

Figure 1: Benchmarking workflows and metrics studied in this work.
Figure 2: Trade-off between top-5 accuracy and inference speed. Circle area is proportional to the number of parameters; color denotes whether a model uses reaction templates (blue), generates a sequence of graph edits (green) or produces the output SMILES from scratch (red). Dashed gray line shows the Pareto front (best result for any time budget). Exact results for Chemformer are not shown as they fall below the plot boundary. We show in-distribution results on USPTO-50K (left) and out-of-distribution generalization on Pistachio (right).
Figure 3: Multi-step search results on the Retro* Hard target set with different single-step models. Left: Time until first solution was found (or $\emptyset$ if a molecule was not solved). Orange line represents the median, box represents 25th and 75th percentile, whiskers represent 5th and 95th percentile, points outside this range are shown as dots. Right: Approximate number of non-overlapping routes present in the search graph (tracked over time and aggregated across target molecules). Solid line represents the median, shaded area shows the 40th and 60th percentile. On the right hand side we note the average number of calls made by the model within the allotted time limit.
Figure 4: Results on USPTO-50K in same format as Figure \ref{['fig:single-step']} but extended with top-1, top-3, top-10, top-50, and MRR. Plot for top-5 is reprinted here for convenience.
Figure 5: Results on Pistachio in same format as Figure \ref{['fig:single-step']} but extended with top-1, top-3, top-10, top-50, and MRR. Plot for top-5 is reprinted here for convenience.

Re-evaluating Retrosynthesis Algorithms with Syntheseus

TL;DR

Abstract

Re-evaluating Retrosynthesis Algorithms with Syntheseus

Authors

TL;DR

Abstract

Table of Contents

Figures (5)